Performing subsets of data in R

I want to perform calculations for each company number in the PERMNO column of my data frame, a summary of which can be seen here:

> summary(companydataRETS)
     PERMNO           RET           
 Min.   :10000   Min.   :-0.971698  
 1st Qu.:32716   1st Qu.:-0.011905  
 Median :61735   Median : 0.000000  
 Mean   :56788   Mean   : 0.000799  
 3rd Qu.:80280   3rd Qu.: 0.010989  
 Max.   :93436   Max.   :19.000000  

My solution so far has been to create a variable with all possible company numbers

compns <- companydataRETS[!duplicated(companydataRETS[,"PERMNO"]),"PERMNO"]

And then use the foreach loop using parallel computing, which calls my get.rho () function, which in turn does the required calculations

rhos <- foreach (i=1:length(compns), .combine=rbind) %dopar% 
      get.rho(subset(companydataRETS[,"RET"],companydataRETS$PERMNO == compns[i]))

I tested it for a subset of my data and it all works. The problem is that I have 72 million observations, and even after exiting the computer that was working overnight, it still wasn't over.

R, , , ( , -) (, apply with, ) ). ?

+4
2

Joran, data.table.

library(data.table) 
companydataRETS <- data.table(companydataRETS)
setkey(companydataRETS,PERMNO)

rhos <- foreach (i=1:length(compns), .combine=rbind) %do% 
      get.rho(companydataRETS[J(compns[i])]$RET)

, ( subset) data.table, compns 30 28659 . system.time() :

subset:

........ .....

 43,925... 12,413...... 56,337

data.table

....... .....

  0.229..... 0.047....... 0.276

( - %do% %dopar% . system.time() subset - , %do%, . )

, 5 , . 5 ( , 3 )!

data.table foreach,

rhos <- companydataRETS[ , get.rho(RET), by=PERMNO]
+2

- , foreach . , , ...

, , get.rho, . , , , , "R-".

, , .

plyr . split-apply-comb. .

data.frame data.frame, ddply - , :

library(plyr)
ddply(companydataRETS, .(PERMNO), summarise, get.rho(RET))

,

library(doMC)
registerDoMC()
ddply(companydataRETS, .(PERMNO), summarise, get.rho(RET), .parallel=TRUE)

tapply :

tapply(companydataRETS$RET, companydataRET$PERMNO, get.rho)

data.table, , , .

, , get.rho , , , .


:

, , , . , Google CRAN .

, , lm. sample . , - .

0

All Articles