Safe decoding in python (instead of the '?' Character)

I have a code:

encoding = guess_encoding()    
text = unicode(text, encoding)

when the wrong character appears in the text. UnicodeDecode exception thrown. How can I silently skip the exception by replacing the wrong character with "?"

+3
source share
1 answer

Try

text = unicode(text, encoding, "replace")

From the documentation :

'replace' causes Unicode characters, U + FFFD to be replaced, to replace input characters that cannot be decoded.

If you want to use "?"Unicode replacements instead of the official character, you can do

text = text.replace(u"\uFFFD", "?")

after converting to Unicode.

+11
source

All Articles