Python, Unicode and the Console (Windows console anyway) – UnicodeEncodeError

So today I was trying to do searches on a text file encoded in UTF-8. The code worked ok except that every time a print statement tried to print something not ASCII I get a error like the one below:

Traceback (most recent call last):
  File "wordCount.py", line 10, in <module>
    print line
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position
 0: character maps to <undefined>

After some digging, the reason became obvious. Python print will use whatever encoding that is default to the current window (in my case a Windows Command console). The default code page is actually cp437. This is also evident by the fact that the traceback says the error is a Encoding error from cp437.py.

What happens is that print will use the current default encoding obtained from sys.stdout.encoding, and since cp437 doesnt know how to encode the unicode character, it errors. This basically means you cant print any of the none ascii characters onto the console, will need to find some other way to print those out.

As a side note, to see the current encoding, do the following:

import sys

print sys.stdout.encoding

Posted in Python. Tagged with cp437, encoding, Python, utf-8.

By James

November 4, 2012

Comments Off on Python, Unicode and the Console (Windows console anyway) – UnicodeEncodeError

About Rants inside

James is from Kiwi land (New Zealand). He received his Bachelor of Engineering with First Class Honors, Summa Cum Laude, in April 2010 from Massey University, Auckland.

He has worked as a consultant/contractor for a number of firms, including camera driven touch screen company NextWindow, ISP Web Drive, and New Zealand Ministry of Economic Development’s contracting firm FMIT (specializing in various online registries).

He is well versed in in Python (Plone), PHP, Javascript (jQuery), Java, MySQL, knowledgeable in C/C++, and a little green in anything .NET

He is currently pursuing a Masters in Electrical Engineering from the University of California, Los Angeles (UCLA). His research interests are machine learning, data classification and wireless health

Outside of professional interests, he has strong interest in Animes, games, and various fantasy books. His favorite authors are Elizabeth Moon and David Eddings etc (in that order).

For more details, please visit his portfolio website

Python, Unicode and the Console (Windows console anyway) – UnicodeEncodeError

About Rants inside

Categories

Archives

Live Traffic Feed

Tags

Python, Unicode and the Console (Windows console anyway) – UnicodeEncodeError

Subscribe

About Rants inside

Categories

Archives

Live Traffic Feed

Tags