Jsoup gets different html compared to Firefox and other browsers

I'm having problems with some URLs from the Kabum online store.

URL http://www.kabum.com.br/cgi-local/kabum3/produtos/descricao.cgi?id=01:02:23:55:159

If I started the site in the address bar or clicked the link, I got a product page, but if I use Jsoup, I get the page with the meta update only at the same address.

I tried to configure the user agent, referrer and follow the link in the meta, but I got the same page.

My code is here:

Document doc;
String url = "http://www.kabum.com.br/cgi-local/kabum3/produtos/descricao.cgi?id=01:02:23:55:159";
try {
    String ua = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0";
    String referrer = "http://www.google.com";
    doc = Jsoup.connect(url).timeout(20000).userAgent(ua).referrer(referrer).get();
    Elements meta = doc.select("html head meta");
    for (Iterator<Element> it = meta.iterator(); it.hasNext();) {
        Element element = it.next();
        if (element.attr("http-equiv").matches("refresh")) {
            String novaUrl = element.attr("content").replaceFirst("\\d?;url=", "");
            System.out.printf("redirecting to %s%n", novaUrl);
            doc = Jsoup.connect(novaUrl).userAgent(ua).referrer(referrer).get();
            break;
        }
    }
} catch (IOException ex) {
    Logger.getLogger(Teste1.class.getName()).log(Level.SEVERE, null, ex);
    return;
}
System.out.println(doc);
+3
source share
2 answers

You need to send a request using cookies. The site returns one session cookie, which it expects to see in the next request.

String url = "http://www.kabum.com.br/cgi-local/kabum3/produtos/descricao.cgi?id=01:02:23:55:159";
Map<String, String> cookies = Jsoup.connect(url).execute().cookies();
Document document = Jsoup.connect(url).cookies(cookies).get();
System.out.println(document.html());

, cookie , .

+2

.

, : <meta http-equiv="refresh" content="0;url=kabum.com.br/cgi-local/kabum3/produtos/…; /> URL.

, , , , , .

, . (1) , jsoup ( "" meta) (2) cookie.

+2

All Articles