JTidymore often used to organize HTML, that is, to correct incorrect or erroneous HTML, such as closed tags, for example, from <div><span>text</div>to <div><span>text</span></div.
JSoupon the other hand, provides a full-blown API for parsing HTML and for extracting parts of it. This allows you to use jQuery, such as selectors , to find elements, or DOMmethods equivalent to those you use with JavaScript, for example getElementById. I would say that JSoup is really the equivalent of BeautifulSoup Java.
For example, to extract the first paragraph of a Wikipedia article using JSoup, you can use the following:
String url = "http://en.wikipedia.org/wiki/Potato";
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select(".mw-content-ltr p");
String firstParagraph = paragraphs.first().text();
Or, to extract the title from this very question:
Document doc = Jsoup.connect("http://stackoverflow.com/questions/12439078/jtidy-or-jsoup-for-java").get();
String question = doc.select("#question-header a").text();
Pretty good API, huh? :-)
source
share