Parallelization on a supercomputer and then combining parallel results (R)

I have access to a large powerful cluster. I am halfway through a worthy R programmer, but completely new to shell commands (and terminal commands in general, except for the basic things you need to do to use ubuntu).

I want to use this cluster to start a bunch of parallel processes in R, and then I want to combine them. In particular, I have a problem similar to:

my.function <-function(data,otherdata,N){
    mod = lm(y~x, data=data)
    a = predict(mod,newdata = otherdata,se.fit=TRUE)
    b = rnorm(N,a$fit,a$se.fit)
    b
    }

r1 = my.function
r2 = my.function
r3 = my.function
r4 = my.function
...
r1000 = my.function

results = list(r1,r2,r3,r4, ... r1000)

The above example is just a stupid example, but basically I want to do something 1000 times in parallel, and then do something with all the results from 1000 processes.

How to send 1000 tasks simultaneously to a cluster, and then combine all the results, for example, in the last line of code?

/ , RTFM, . , , , .

!

+5
3

plyr doMC ( foreach) :

require(plyr)
require(doMC)
registerDoMC(20) # for 20 processors

llply(1:1000, function(idx) {
    out <- my.function(.)
}, .parallel = TRUE)

: , LSF-? bsub , , , ...!

2: ( LSF bsub):

, , , = > LSF. jobs . : LSF, bsub :

bsub -m <nodes> -q <queue> -n <processors> -o <output.log> 
     -e <error.log> Rscript myscript.R

( ) ( ). pause, restart, suspend .. .. qsub - . , .

+5

R ( ). .

+5

The messaging interface does what you want to do, and it is very easy to do it. after compilation you need to run:

mpirun -np [no.of.process] [executable]

you choose where to run it with a simple text file with host IP addresses, for example:

node01   192.168.0.1
node02   192.168.0.2
node03   192.168.0.3

here are more examples of MPI.

+2
source

All Articles