I am looking for a quality HTML HTML parser in Python. This should not be fast, but I would like it to support as many specifications as possible , including itemref.
Here is what I have found so far:
Do you use any of these libraries? What were the pros and cons?
I'm also interested in parsing poorly formatted HTML documents. Have you found a Microdata analyzer that handles messy input, or do you start input with BeautifulSoup first ?
source
share