So today I was trying to do searches on a text file encoded in UTF-8. The code worked ok except that every time a print statement tried to print something not ASCII I get a error like the one below:
Traceback (most recent call last): File "wordCount.py", line 10, in <module> print line File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>
After some digging, the reason became obvious. Python print will use whatever encoding that is default to the current window (in my case a Windows Command console). The default code page is actually cp437. This is also evident by the fact that the traceback says the error is a Encoding error from cp437.py.
What happens is that print will use the current default encoding obtained from sys.stdout.encoding, and since cp437 doesnt know how to encode the unicode character, it errors. This basically means you cant print any of the none ascii characters onto the console, will need to find some other way to print those out.
As a side note, to see the current encoding, do the following:
import sys print sys.stdout.encoding