I am trying to use doSMP / foreach to parallelize some code in R.
I had a huge 2d matrix of genetic data - 10,000 observations (rows) and 3 million variables (columns). I had to break this data into pieces of 1000 variables due to memory problems.
I want to read in each file, do some statistics and write these results to a file. This is easy with a for loop, but I want to use foreach to speed it up. That's what I'm doing:
require(doSMP)
print(filelist <- system("ls matrix1k.*.txt", T))
w <- startWorkers(2)
registerDoSMP(w)
foreach (i = 1:length(filelist)) %dopar% {
print(i)
file <- filelist[i]
print(file)
thisfile <- read.table(file,header=T)
}
stopWorkers(w)
But this leads to an error: Error in { : task 2 failed - "cannot open the connection". When I change %dopar%to %do%, there is no problem.
source
share