Jsoup - Missing Content

Question

Jsoup - Missing Content

I am executing the following code with JSoup

Document parse = Jsoup.connect("http://www.google.com/movies?near=<MyCity>&sort=1&start=0")
                       .followRedirects(true)
                       .ignoreContentType(true)
                       .timeout(12000)
                       .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
                       .referrer("http://www.google.com")
                       .execute()
                       .parse();
Elements elements = parse.select(".movie_results .movie");

but when I check elementsit clearly skips a lot of content. I am trying to get the title and description of the movie from the page above.

What am I missing? Could this be due to the lack of header options, cookies? Is there any other library that could solve the problem?

I am trying to reproduce the same problem by doing:

curl http://www.google.com/movies?near=<MyCity>&sort=1&start=0 > page.html

Protyp

Just by highlighting one of the comments: try.jsoup.org is a good place to start using Jsoup. This will help you parse html in a very simple way.

Please +1 if you liked the tip and saved your day: D

+3

android parsing jsoup

MatheusJardimB Feb 10 '14 at 16:44

source share

1 answer

MatheusJardimB · Accepted Answer · 2014-02-10T21:05:58+0000

Google Chrome Dev Tools , . :

Jsoup.connect(url)
  .followRedirects(true)
  .ignoreContentType(true)
  .timeout(12000) // optional
  .header("Accept-Language", "pt-BR,pt;q=0.8") // missing
  .header("Accept-Encoding", "gzip,deflate,sdch") // missing
  .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36") // missing
  .referrer("http://www.google.com") // optional
  .execute()
  .parse();

!

Jsoup - Missing Content

More articles: