本文发自 http://www.binss.me/blog/solve-problem-of-python3-raise-unicodeencodeerror-when-print-utf8-string/,转载请注明出处。
最近在对bismarck进行升级,主要是从Python2迁移到Python3,并更换爬取方案。
结果出师不利,在将爬取到的商品标题print出来时,抛出错误:
root@fb6e7c6fbe5c:/home/binss# python3 amazon_test.py
Traceback (most recent call last):
File "amazon_test.py", line 30, in
print(s)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
Python2时代最怕就是这个UnicodeEncodeError,没想到到了Python3,又见到它了。
查看第一个字符,发现为'\u8266',于是测试以下代码:
>>> print('\u8266')
果然报错
Traceback (most recent call last):
File "", line 1, in
UnicodeEncodeError: 'ascii' codec can't encode character '\u8266' in position 0: ordinal not in range(128)
尝试了各种姿势,结果还是没能解决。
最后突发奇想,print不行,那我把其输出到文件捏?
>>> s = '\u8266'
>>> with open('xxx.txt', mode='w') as pubilc_file:
... pubilc_file.write(s)
依然报错
Traceback (most recent call last):
File "", line 2, in
UnicodeEncodeError: 'ascii' codec can't encode character '\u8266' in position 0: ordinal not in range(128)
那换成二进制输出呢?
>>> s = '\u8266'.encode('utf-8')
>>> with open('xxx.txt', mode='wb') as pubilc_file:
... pubilc_file.write(s)
竟然成功输出了正确的字符——"艦"!这,难道是因为终端的stdout不支持utf-8输出?
于是打印看看当前的stdout是啥
root@fb6e7c6fbe5c:/home/binss# python3
Python 3.5.1 (default, Dec 18 2015, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout
<_io.TextIOWrapper name='' mode='w' encoding='ANSI_X3.4-1968'>
>>>
这个ANSI_X3.4-1968的编码是什么东西?怎么不是utf-8?以此为关键词Google,终于搜到相关文章:
http://lab.knightstyle.info/私がpython3でunicodeencodeerrorなのはどう考えてもデフォルト文字/
大概意思就是如果要输出utf-8,需要通过以下代码将ANSI_X3.4-1968改为utf-8
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
然后再次检验stdout是否为utf-8
>>> sys.stdout
<_io.TextIOWrapper name='' encoding='utf-8'>
之后就可以愉快地print了
>>> print('\u8266')
艦
1F Tsn_09 7 years, 10 months ago 回复
感谢啊,我也遇到了这个问题。。。
2F dk 7 years, 7 months ago 回复
炒鸡感谢!!!!!
3F 1 7 years, 7 months ago 回复
棒!
4F code 7 years, 5 months ago 回复
感谢博主
5F Trisolaries 7 years, 4 months ago 回复
感谢博主,初入门python就遇到这个, 终于找到解决反感了
6F joe 7 years, 4 months ago 回复
加了这句之后还是有这个问题,可能是什么原因呢
7F d 7 years ago 回复
谢谢分享, 真遇到了这个神奇的问题
8F linukey 6 years, 9 months ago 回复
666
9F jeffery 6 years, 7 months ago 回复
感谢博主
10F 修改一下i18n 6 years, 7 months ago 回复
[root@server2 ~]# cat /etc/sysconfig/i18n
LANG="zh_CN.GB18030"
SUPPORTED="zh_CN.GB18030:zh_CN:zh:en_US.UTF-8:en_US:en"
SYSFONT="latarcyrheb-sun16"
11F yudachi 6 years, 6 months ago 回复
看到艦就意识到问题并不简单
12F bigface 5 years, 7 months ago 回复
看到日文和头像就意识到我要说点什么
13F 智能血压计 5 years, 5 months ago 回复
厉害!
14F 老菜鸟 5 years, 5 months ago 回复
感谢🙏
15F Tsquare619 4 years, 11 months ago 回复
Thank you very much!