I am writing a very simple web spider in java.I am faced with one problem that the content downloaded for the same URL is different from the content in the browser. For example, try finding a URL.
http://www.google.co.in/search?sourceid=chrome&ie=UTF-8&q=web+spider#sclient= psi & hl = ep & source = hp & Q = Web + spider & water = F & AQI = & acl = & OQ = Web + spider & PBX = 1 & Fp = d8e8e41d6d2bda33 & BIW = 1366 & BiH = 643
If you download this URL in a browser and through the JAVA-URL class, the content is different. This may be due to the following reasons.
So, there is a way that I model the browser in my java program. Are there any third-party libraries that load a page similar to what the browser does, and finally return the content. Any help is appreciated.
try htmlunit , it can emulate browser behavior and handle javascript