I am trying to clear and ultimately parse some data (in particular, prices and availability) on hostels.com, for example http://www.hostels.com/hosteldetails.php/HostelNumber.11890 . The problem is that after you select the number of nights and select "book now", nothing is transmitted via the URL string (all this is done through Ajax, I believe). I cannot go directly to a specific date or time frame.
I tried using browser emulators like Selenium, IRobotSoft, and FakeApp, and although I got Selenium and Fake to do most of the work, taking the full source, it was awful and still tedious to have to scratch (and analyze with other software) a few pages a day.
I also tried HTML DOM Parser, PHP scriptable Web Browser, HTMLUnit, cScrape.php, Crowbar. Either they could not cope with Ajax, or I was not lucky, even if they ran.
Ideally, I would like something that can be run from the server, with the greatest possible dependency, but at this point I just would like to run it.
Now, having spent many hours trying to get this to work. I still feel like I don’t know where to start. Can someone just point me in the right direction ?. Should I go back and spend more time using HTMLUnit? What would be the best practice for such a site?
thank
Alex source
share