I would like to extract the various HTML tags available from the source code of a webpage, is there any method in Java to do this or make it an HTML parser?
I want to highlight all HTML tags.
Check out the CyberNeko HTML Parser .
You can use regular expressions. If your html is valid XML - you can use an XML parser
Java XML- DOM JavaScript:
DocumentBuilder builder = DocumentBuilderFactory.newDocumentBuilder(); Document doc = builder.parse(html); doc.getElementById("someId"); doc.getElementsByTagName("div"); doc.getChildNodes();
( , html ..).
http://download.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Document.html
The cyber-neko analyzer is also good if you need more.
You can write your own method utilfor extracting tags.
util
Check the tags <and />or >for the full tag and write these tags to another file.
<
/>
>
I used HTMLParser in one project, I was very pleased with this.
Edit: if you check the sample page, the parser sample does pretty much what you ask for.