Python / web scrape / aspx - is this possible if there are no forms?

Total noob, obviously. Teaching stand-alone Python for web scraping for open reporting / government transparency / reporting, etc.

Here is the .aspx page that I want to clear, the weekly calendar for January-March 2012.

But he has no forms ...

Perhaps you wonderful people can tell me if a solution is possible, even if I spend days with him.

http://webmail.legis.ga.gov/Calendar/default.aspx?chamber=house

The only way to see the appointments on the calendar is to select the day in the calendar picture. But at least if you click on Monday, it will show all appointments for the week. (I would like to collect all these meetings in order to calculate how often each committee meets, a little proxy to calculate which legislation attracts attention and which kind is ignored.)

But so what strategy to use? It seems that every month, at least in the bowels, is assigned a consecutive four-digit number added with "V", for example, V4414, and days with an unlisted number.

I only hunt for January - March 2012; other months are not related and mostly empty.

key?

    ...<a href="javascript:__doPostBack('calMain','V4414')" style="color:#333333" title="Go to the previous month">February</a></td><td align="center" style="width:70%;">March 2012</td><td align="right" valign="bottom" style="color:#333333;font-size:8pt;font-weight:bold;width:15%;"><a href="javascript:__doPostBack('calMain','V4474')" style="color:#333333" title="Go to the next month">April</a></td></tr> 

template?

    ...<td align="center" style="color:#999999;width:14%;"><a      href="javascript:__doPostBack('calMain','4439')" style="color:#999999" title="February 26">26</a></td><td align="center" style="color:#999999;width:14%;"><a href="javascript:__doPostBack('calMain','4440')" style="color:#999999" title="February 27">27</a></td><td align="center" style="color:#999999;width:14%;"><a href="javascript:__doPostBack('calMain','4441')" style="color:#999999" title="February 28">28</a></td>...

Greetings and thanks!

+5
source share
2 answers

input name of:

  • __EVENTTARGET
  • __EVENTARGUMENT
  • __VIEWSTATE
  • __EVENTVALIDATION

. . . , :

<a href="javascript:__doPostBack('calMain','4504')" style="color:Black" title="May 01">1</a>

href:

javascript:__doPostBack('calMain','4504')

- . - __EVENTTARGET. __EVENTARGUMENT.

, , POST, .

+5

POST- Python, - urllib.parse.urlencode .

, . , Selenium RC.

+2

All Articles