Getting Jsoup to support dynamically generated html using JavaScript

I'm working on a web browser right now. It is necessary to analyze some specific sites and give me output in an XML file. Up to this point this is not a problem. Crawler works, and you can quickly configure it through a cfg file. I use Jsoup to parse HTML content.

I added a few more sites and noticed that I was having a huge problem with HTML content created using JavaScript. Isn't there a way to get Jsoup to support Javascript? Or at least get the full HTML content that I can see in my browser.

I already tried HtmlUnit, but that is not very good. He did not give me the content that I would receive in his browser.

Sincerly

Ogofo

+5
source share
1 answer

Jsoup does not support javascript and does not emulate a browser. Just forget about it if you plan to run Javascript. In my experience, HtmlUnit, which is a mute browser, gave me better results (always talking about Java frameworks).

One thing worth trying in HtmlUnit is to change BrowserVersion(Chrome / InternetEplorer / FireFox) when creating the instance WebClient. Some sites react differently, and sometimes just changing this value can give you the results you expect to get.

+7
source

All Articles