I am looking for an HTML parser written in standard C ++ that can handle invalid documents and is open source (LGPL, BSD - not GPL). It must compile with GCC and be standalone.
May this one help you.
Select the license you need.