I want to process the output of a running program in turn (think tail -f) using the Python 3 script (on Linux).
The output of the programs that connects to the script is encoded in Latin-1, so in Python 2 I used a module codecsto decode the input correctly sys.stdin:
import sys, codecs
sin = codecs.getreader('latin-1')(sys.stdin)
for line in sin:
print '%s "%s"' % (type (line), line.encode('ascii','xmlcharrefreplace').strip())
This worked:
<type 'unicode'> "Hi! öäß"
...
However, in Python 3, sys.stdin.encodingthere is UTF-8, and if I just naively read stdin:
import sys
for line in sys.stdin:
print ('type:{0} line:{1}'.format(type (line), line))
I get this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 4: invalid start byte
How can I read UTF-8 text data passed to stdin in Python 3?
source
share