Ajax scraper using Python

I am trying to get the data in a table on this website, which is updated via jquery after the page loads (I have permission):

http://whichchart.com/

I am currently using selenium and beautifulsoup to retrieve data, however, since this data is not visible in the html source, I cannot access it. I tried PyQt4, but it also does not get the updated html source.

The values ​​are visible in firebug and the chrome developer, so are there any python packages that can use this and feed it to beautifulsoup?

I am not a massive technician, so ideally I would like to find a solution that will work in Python or the next simplest type of software.

I know I can get this through the proprietary screen-scraper software, but it's expensive.

+3
source share
1 answer

The page makes an AJAX call to retrieve http://whichchart.com/service.php?action=NewcastleCoal data , which returns values ​​in JSON. So you can do the following:

  • Use urllib to get data using HTTP
  • Parse this data using json library.
  • You now have a python object to handle

If you need to process the contents of an HTML page, I would suggest using a library like BeautifulSoup, or scrapy

+5
source

All Articles