I am trying to clear some data from a website. This is what I usually did in Perl, but I would really like to wean myself from Perl. (I won’t disappoint Perl, it was a valuable tool, but I’m saddened by how much I’m still afraid of the language after more than a decade.) Since my needs are simple and performance is rarely a problem for me, I want to move my web scraper to R I know some Rs, but I have never used RCURL or similar libraries.
The challenge is to clear the public data database. The problem is complicated by the fact that I don’t know exactly how to pass the arguments, because I just look at the JS source and try to figure out what to include in the RCurl postForm request. There are no obvious errors in the code below, but it also does not return anything useful.
Q. What am I doing wrong?
[Edited: to reflect proposed changes, but not yet allowed]
require(RCurl)
x <- postForm('http://jamaserv.jama.or.jp/newdb/eng/prod4/prod4TsMkEntry.html?pass',
chkSelCnd3 = '0',
'prod4TsMkEntryForm/eng/prod4/prod4TsMkEntry.html' = 'prod4TsMkEntryForm',
makerCd = '5',
additionBase = '1',
termTo = '201203',
'prod4TsMkEntryForm:doAction' = 'Server',
additionInterval = '1',
termFrom = '201103',
car4Cd = '100005',
.opts = curlOptions(
referer = 'http://jamaserv.jama.or.jp/newdb/eng/prod4/prod4TsMkEntry.html',
verbose = TRUE,
header = TRUE,
followLocation = TRUE,
useragent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'
)
)
When using a browser, the form is as follows:

And the above settings return (on a separate page) the following:
