R: How to read in a large data set (> 35 MM rows) from R to Windows in parts?

How do you read / manage datasets in R that exceed the allocated memory limit?

EDIT: Huge help so far, thanks. Let me add an additional restriction. The server belongs to the enterprise, and I do not have administrative access. Is there a way to read partial files using read.table or something similar (e.g. assigning nrows to read only 100,000 lines at a time)? We need a workaround that can work with the current environment, so you can not use fread, bigmemory, etc.

My target dataset contains about 32 million rows with 30 columns, divided into 12 roughly the same files (some of them are readable and some are not).

Files "|" it is delimited and stored on remote giving in 12 separate files. About half of the files can be read using R, the other half will exceed the limit.

I use a simple read and rbind script:

path<-"filepath/mydata/contains 12 files.txt/"
fulldf<-data.frame()
for(i in 1:length(dir(path))){
    file1<-read.table(file=paste0(path,dir(path[i]), sep="|", fill=T, quote="\"")
    fulldf<-rbind(fulldf,file2)
}

First of all, I would like to multiply the data and write it to .csv (for example, read the data in parts, a subset by location, then rbind), but some of the files are simply too large to read.

Is there a way to read part of a large file in parts, i.e. to break an unreadable file into readable fragments?

System: Microsoft Windows Server 2003 R2 Enterprise Edition Service Pack 2

Computer: Intel (R) Xeon (TM) MP CPU 3.66GHz 3.67 GHz, 12.0 GB RAM Physical address extension

> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.7.1

loaded via a namespace (and not attached):
[1] tools_2.12.1
+3
source share
1 answer

, iotools chunk.reader, read.chunk chunk.apply , .

, read.table nrows skip? colClasses, , .

0

All Articles