Filling Matrix Using "For Loop" Too Long

I am trying to create a data frame that is about 1,000,000 x 5 using a for loop, but it was 5+ hours and I don't think it will end soon. I am using the rjson library to read data from a large json file. Can someone help me fill this data frame faster?

library(rjson)

# read in data from json file
file <- "/filename"
c <- file(file, "r")
l <- readLines(c, -1L)
data <- lapply(X=l, fromJSON)

# specify variables that i want from this data set
myvars <- c("url", "time", "userid", "hostid", "title")
newdata <- matrix(data[[1]][myvars], 1, 5, byrow=TRUE)

# here where it goes wrong
for (i in 2:length(l)) {
newdata <- rbind(newdata, data[[i]][myvars])
}

newestdata <- data.frame(newdata)
+3
source share
2 answers

This happens forever, because each iteration of your loop creates a new, larger object. Try the following:

slice <- function(field, data) unlist(lapply(data, `[[`, field))
data.frame(Map(slice, myvars, list(data)))

This will create data.frame and save your original data types: character, number, etc., if that matters. When forcing everything into a matrix, everything will be embedded in a character class.

+1
source

, , , . , , , :

newdata <- vapply(data, `[`, character(5L), myvars)

, data character, , , title.

, , , , , , , R . vapply , .

+1

All Articles