Ordinal not in range 128 ошибка - Ремонт и установка крупной бытовой техники

I am attempting to work with a very large dataset that has some non-standard characters in it. I need to use unicode, as per the job specs, but I am baffled. (And quite possibly doing it all wrong.)

I open the CSV using:

 15     ncesReader = csv.reader(open('geocoded_output.csv', 'rb'), delimiter='t', quotechar='"')

Then, I attempt to encode it with:

name=school_name.encode('utf-8'), street=row[9].encode('utf-8'), city=row[10].encode('utf-8'), state=row[11].encode('utf-8'), zip5=row[12], zip4=row[13],county=row[25].encode('utf-8'), lat=row[22], lng=row[23])

I’m encoding everything except the lat and lng because those need to be sent out to an API. When I run the program to parse the dataset into what I can use, I get the following Traceback.

Traceback (most recent call last):
  File "push_into_db.py", line 80, in <module>
    main()
  File "push_into_db.py", line 74, in main
    district_map = buildDistrictSchoolMap()
  File "push_into_db.py", line 32, in buildDistrictSchoolMap
    county=row[25].encode('utf-8'), lat=row[22], lng=row[23])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)

I think I should tell you that I’m using python 2.7.2, and this is part of an app build on django 1.4. I’ve read several posts on this topic, but none of them seem to directly apply. Any help will be greatly appreciated.

You might also want to know that some of the non-standard characters causing the issue are Ñ and possibly É.

Источник

Иногда на нашем сервере выскакивает следующая ошибка:
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’u200e’ in position 13: ordinal not in range(128)

Ошибка: порядковый номер вне диапазона (128)
Причина: это ошибка, вызванная проблемой с кодировкой китайских символов в Python, в основном вызванной символом u200e

естьУправляющие символы обозначают надписи слева направо, Это не пробел, полностью невидимый, символ без ширины, мы обычно не видим его на веб-страницах.
аналогичен управляющим символам формата Unicode, таким как «писать метку справа налево» ( u200F) и «писать метку слева направо» ( u200E), нулевая ширина Соединитель ( u200D) и не-коннектор нулевой ширины ( uFEFF) управляют визуальным отображением текста, что важно для правильного отображения некоторых неанглийских текстов.

Решение: добавьте следующий блок операторов в заголовок файла, в котором расположен код Python.

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Если вы добавите приведенный выше блок кода, чтобы представить проблему сбоя функции печати в python,Затем замените приведенный выше блок кода следующим блоком кода

import sys # здесь просто ссылка на sys, перезагружается только перезагрузка
stdi,stdo,stde=sys.stdin,sys.stdout,sys.stderr 
reload(sys)  # При ссылке при импорте,Функция setdefaultencoding удаляется после вызова системой, поэтому ее необходимо перезагрузить один раз.
sys.stdin,sys.stdout,sys.stderr=stdi,stdo,stde

Источник

Toggle table of contents sidebar

Ошибки при конвертации#

При конвертации между строками и байтами очень важно точно знать, какая
кодировка используется, а также знать о возможностях разных кодировок.

Например, кодировка ASCII не может преобразовать в байты кириллицу:

In [32]: hi_unicode = 'привет'

In [33]: hi_unicode.encode('ascii')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-33-ec69c9fd2dae> in <module>()
----> 1 hi_unicode.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

Аналогично, если строка «привет» преобразована в байты, и попробовать
преобразовать ее в строку с помощью ascii, тоже получим ошибку:

In [34]: hi_unicode = 'привет'

In [35]: hi_bytes = hi_unicode.encode('utf-8')

In [36]: hi_bytes.decode('ascii')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-36-aa0ada5e44e9> in <module>()
----> 1 hi_bytes.decode('ascii')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Еще один вариант ошибки, когда используются разные кодировки для
преобразований:

In [37]: de_hi_unicode = 'grüezi'

In [38]: utf_16 = de_hi_unicode.encode('utf-16')

In [39]: utf_16.decode('utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-39-4b4c731e69e4> in <module>()
----> 1 utf_16.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Наличие ошибок — это хорошо. Они явно говорят, в чем проблема.
Хуже, когда получается так:

In [40]: hi_unicode = 'привет'

In [41]: hi_bytes = hi_unicode.encode('utf-8')

In [42]: hi_bytes
Out[42]: b'xd0xbfxd1x80xd0xb8xd0xb2xd0xb5xd1x82'

In [43]: hi_bytes.decode('utf-16')
Out[43]: '뿐胑룐닐뗐苑'

Обработка ошибок#

У методов encode и decode есть режимы обработки ошибок, которые
указывают, как реагировать на ошибку преобразования.

Параметр errors в encode#

По умолчанию encode использует режим strict — при возникновении ошибок
кодировки генерируется исключение UnicodeError. Примеры такого поведения
были выше.

Вместо этого режима можно использовать replace, чтобы заменить символ
знаком вопроса:

In [44]: de_hi_unicode = 'grüezi'

In [45]: de_hi_unicode.encode('ascii', 'replace')
Out[45]: b'gr?ezi'

Или namereplace, чтобы заменить символ именем:

In [46]: de_hi_unicode = 'grüezi'

In [47]: de_hi_unicode.encode('ascii', 'namereplace')
Out[47]: b'gr\N{LATIN SMALL LETTER U WITH DIAERESIS}ezi'

Кроме того, можно полностью игнорировать символы, которые нельзя
закодировать:

In [48]: de_hi_unicode = 'grüezi'

In [49]: de_hi_unicode.encode('ascii', 'ignore')
Out[49]: b'grezi'

Параметр errors в decode#

В методе decode по умолчанию тоже используется режим strict и
генерируется исключение UnicodeDecodeError.

Если изменить режим на ignore, как и в encode, символы будут просто
игнорироваться:

In [50]: de_hi_unicode = 'grüezi'

In [51]: de_hi_utf8 = de_hi_unicode.encode('utf-8')

In [52]: de_hi_utf8
Out[52]: b'grxc3xbcezi'

In [53]: de_hi_utf8.decode('ascii', 'ignore')
Out[53]: 'grezi'

Режим replace заменит символы:

In [54]: de_hi_unicode = 'grüezi'

In [55]: de_hi_utf8 = de_hi_unicode.encode('utf-8')

In [56]: de_hi_utf8.decode('ascii', 'replace')
Out[56]: 'gr��ezi'

Источник

Overview

Example errors:

Traceback (most recent call last):
  File "unicode_ex.py", line 3, in
    print str(a) # this throws an exception
UnicodeEncodeError: 'ascii' codec can't encode character u'xa1' in position 0: ordinal not in range(128)

This issue happens when Python can’t correctly work with a string variable.

Strings can contain any sequence of bytes, but when Python is asked to work with the string, it may decide that the string contains invalid bytes.

In these situations, an error is often thrown that mentions ordinal not in range, or codec can't encode character, or codec can't decode character.

Here’s a bit of code that may reproduce the error in Python 2:

a='xa1'
print(a + ' <= problem')
unicode(a)

Initial Steps Overview

Check Python version
Determine codec and character

Detailed Steps

1) Check Python version

The Python version you are using is significant.

You can determine the Python version by running:

python --version

or, if you have access to the running code, by logging it:

print(sys.version)

The major number (2 or 3) is the number you are interested in.

It is expected that you are using Python2.

2) Determine interpreting codec and character

Get this from the error message:

UnicodeEncodeError: 'ascii' codec can't encode character u'xa1' in position 0: ordinal not in range(128)

In this case, the code is ascii and the character is the hex character A1.

What is happening here is that Python is trying to interpret a string, and expects that the bytes in that string are legal for the format it’s expecting. In this case, it’s expecting a string composed of ASCII bytes. These bytes are in the range 0-127 (ie 8 bytes). The hex byte A1 is 161 in decimal, and is therefore out of range.

When Python comes to interpret this string in a context that requires a codec (for example, when calling the unicode function), it tries to ‘encode’ it with the codec, and can hit this problem.

3) Determine desired codec

You need to figure out how the bytes should be interpreted.

Most often in everyday use (eg web scraping or document ingestion), this is utf-8.

Once you have determined the desired codec, solution A may help you.

Solutions List

A) Decode the string

Solutions Detail

A) Decode the string

If you have a string s that you want to interpret as utf-8 data, you can try:

s = s.decode('utf-8')

to re-encode the string with the appropriate codec.

Further Information

Owner

Ian Miell

comments powered by

Источник

Several errors can arise when an attempt to change from one datatype to another is made. The reason is the inability of some datatype to get casted/converted into others. One of the most common errors during these conversions is Unicode Encode Error which occurs when a text containing a Unicode literal is attempted to be encoded bytes. This article will teach you how to fix UnicodeEncodeError in Python.

Why does the UnicodeEncodeError error arise?

An error occurs when an attempt is made to save characters outside the range (or representable range) of an encoding scheme because code points outside the encoding scheme’s upper bound (for example, ASCII has a 256 range) do not exist. An error would be produced by values greater than +127 or -128. To solve the issue, the string would need to be encoded using an encoding technique that permitted representation of that code point. UTF-8 (Unicode Transformation-8-bit), UTF-16, UTF-32, ASCII, and others are examples of frequently used encodings. UTF-8 would often fix this problem.

For demonstration, the same error would be reproduced and then fixed:

Python3

a = 'geeksforgeeks1234567xa0'.encode("ASCII")

print(a)

Output:

Traceback (most recent call last):

File “C:/Users/test.py”, line 1, in <module>

b = ‘geeksforgeeks1234567xa0’.encode(“ASCII”)

UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘xa0’ in position 20: ordinal not in range(128)

How to solve this UnicodeEncodeError?

The error is the same as the one in hand. The error arose as an attempt to represent a character was made, which was outside the range of the ASCII encoding system. i.e., ASCII could only represent character values between the range -128 to 127, but xa0 = 128, which is outside the range of ASCII. This led to the error. To rectify this error, we have to encode the text in a scheme that allows more code points (range) than ASCII. UTF-8 would serve this purpose.

Python3

a = 'geeksforgeeks1234567xa0'.encode("UTF-8")

print(a)

Output:

b'geeksforgeeks1234567xc2xa0'

The program was executed this time because the string was encoded by a standard that allowed encoding code points greater than 128. Due to this, the character xa0 (code point 128) got converted to xc2xa0, a two-byte representation.

Similarly, the error UnicodeEncodeError could be resolved by encoding to a format such as UTF-16/32, etc.

Python3

a = 'geeksforgeeks1234567xa0'.encode("UTF-16")

print(a, end="nnn")

a = 'geeksforgeeks1234567xa0'.encode("UTF-32")

print(a)

Output:

b’xffxfegx00ex00ex00kx00sx00fx00ox00rx00gx00ex00ex00kx00sx001x002x003x004x005x006x007x00xa0x00′

b’xffxfex00x00gx00x00x00ex00x00x00ex00x00x00kx00x00x00sx00x00x00fx00x00x00ox00x00x00rx00x00x00gx00x00x00ex00x00x00ex00x00x00kx00x00x00sx00x00x001x00x00x002x00x00x003x00x00x004x00x00x005x00x00x006x00x00x007x00x00x00xa0x00x00x00′

Last Updated :
23 Jan, 2023

Like Article

Save Article

Источник

Ошибки при конвертации#

Обработка ошибок#

Параметр errors в encode#

Параметр errors в decode#

Overview

Initial Steps Overview

Detailed Steps

1) Check Python version

2) Determine interpreting codec and character

3) Determine desired codec

Solutions List

Solutions Detail

A) Decode the string

Further Information

Owner

Why does the UnicodeEncodeError error arise?

Python3

How to solve this UnicodeEncodeError?

Python3

Python3

Возможно, вам также будет интересно: