I am trying to parse an HTML document using beautifulsoup, but I am having problems. What is the best way to open an HTML document using Windows-1252 encoding?
I tried using iconv to convert to utf-8, but this also does not work.
doc = open("e.html").read()
soup = BeautifulSoup(doc)
soup.findAll('p')
UnicodeEncodeError: ascii codec cannot encode u '\ xfc' character at position 103: serial number not in range (128)
When I open it without an icon, I get the same error.
full trace:
>>> soup.findAll('p')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 103: ordinal not in range(128)
source
share