I have a JSON file (export from mongoDB) that I would like to upload to R. The size of the document is about 890 MB, and about 63,000 lines of 12 fields. Fields are numeric, symbol and date. I would like to get 63000 x 12 data frames.
lines <- readLines("fb2013.json")
Result: jFile has all 63,000 elements in the char class, and all fields are concentrated in one field.
Each file looks something like this:
"{\" _ id \ ": \" 10151271769737669 \ ", \" comments_count \ ": 36, \" created_at \ ": {\" $ date \ ": 1357941938000}, \" icon \ ": \" http: //blahblah.gif \ ", \" likes_count \ ": 450, \" link \ ": \" http: //www.blahblahblah.php \ ", \" message \ ": \" I wish I could this is "! \", \ "page_category \": \ "Computers \", \ "page_id \": \ "30968999999 \", \ "page_name \": \ "NothingButTrouble \", \ "type \": \ " photo \ ", \" updated_at \ ": {\" $ date \ ": 1358210153000}}"
Using rjson,
jFile <- fromJSON(paste(readLines("fb2013.json"), collapse=""))
only the first line is read in jFile, but there are 12 fields.
Using RJSONIO:
jFile <- fromJSON(lines)
leads to the following:
Warning messages:
1: In if (is.na(encoding)) return(0L) :
the condition has length > 1 and only the first element will be used
, jFile 12 .
rjson RJSONIO :
$`_id`
[1] "1018535"
$comments_count
[1] 0
$created_at
$date
1.357027e+12
$icon
[1] "http://blah.gif"
$likes_count
[1] 20
$link
[1] "http://www.chachacha"
$message
[1] "I'd love to figure this out."
$page_category
[1] "Internet/software"
$page_id
[1] "3924395872345878534"
$page_name
[1] "Not Entirely Hopeless"
$type
[1] "photo"
$updated_at
$date
1.357027e+12