Skip to content

Python, Unicode and the Console (Windows console anyway) – UnicodeEncodeError

So today I was trying to do searches on a text file encoded in UTF-8. The code worked ok except that every time a print statement tried to print something not ASCII I get a error like the one below:

Traceback (most recent call last):
  File "", line 10, in <module>
    print line
  File "C:\Python27\lib\encodings\", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position
 0: character maps to <undefined>

After some digging, the reason became obvious. Python print will use whatever encoding that is default to the current window (in my case a Windows Command console). The default code page is actually cp437. This is also evident by the fact that the traceback says the error is a Encoding error from

What happens is that print will use the current default encoding obtained from sys.stdout.encoding, and since cp437 doesnt know how to encode the unicode character, it errors. This basically means you cant print any of the none ascii characters onto the console, will need to find some other way to print those out.

As a side note, to see the current encoding, do the following:

import sys

print sys.stdout.encoding

Posted in Python. Tagged with , , , .