Extract HTML tags using Java

Question

Extract HTML tags using Java

I would like to extract the various HTML tags available from the source code of a webpage, is there any method in Java to do this or make it an HTML parser?

I want to highlight all HTML tags.

+3

java html

harshini Mar 21 '11 at 7:50

source share

5 answers

Whitefang34 · Answer 1 · 2011-03-21T07:52:55+0000

Check out the CyberNeko HTML Parser .

Mikhail · Answer 2 · 2011-03-21T07:54:10+0000

You can use regular expressions. If your html is valid XML - you can use an XML parser

Adam ayres · Answer 3 · 2011-03-21T07:58:51+0000

Java XML- DOM JavaScript:

DocumentBuilder builder = DocumentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(html);
doc.getElementById("someId");
doc.getElementsByTagName("div");
doc.getChildNodes();

( , html ..).

http://download.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Document.html

The cyber-neko analyzer is also good if you need more.

developer · Answer 4 · 2011-03-21T08:00:19+0000

You can write your own method utilfor extracting tags.

Check the tags <and />or >for the full tag and write these tags to another file.

Rune aamodt · Answer 5 · 2011-03-21T08:02:05+0000

I used HTMLParser in one project, I was very pleased with this.

Edit: if you check the sample page, the parser sample does pretty much what you ask for.

Extract HTML tags using Java

More articles: