Utf-8 plus question marks

Question

Utf-8 plus question marks

I have a website that displays user input by decoding it in unicode using utf-8. However, user input may include binary data, which obviously cannot always be "decoded" using utf-8.

I am using Python and I get an error:

'utf8' codec cannot decode byte 0xbf at position 0: unexpected byte code. You went to '\ xbf \ xcd ...

Is there a standard efficient way to convert these unprovable characters to question marks?

It would be very helpful if the answer uses Python.

+3

python encoding unicode utf-8

primroot Mar 20 '11 at 17:24

source share

2 answers

, :

str.decode('utf8','ignore')

,

+1

Chris Farmiloe 20 . '11 17:35

Joril · Accepted Answer · 2011-03-20T17:34:18+0000

Try:

inputstring.decode("utf8", "replace")

See here for reference.

Utf-8 plus question marks

More articles: