I want to perform calculations for each company number in the PERMNO column of my data frame, a summary of which can be seen here:
> summary(companydataRETS)
PERMNO RET
Min. :10000 Min. :-0.971698
1st Qu.:32716 1st Qu.:-0.011905
Median :61735 Median : 0.000000
Mean :56788 Mean : 0.000799
3rd Qu.:80280 3rd Qu.: 0.010989
Max. :93436 Max. :19.000000
My solution so far has been to create a variable with all possible company numbers
compns <- companydataRETS[!duplicated(companydataRETS[,"PERMNO"]),"PERMNO"]
And then use the foreach loop using parallel computing, which calls my get.rho () function, which in turn does the required calculations
rhos <- foreach (i=1:length(compns), .combine=rbind) %dopar%
get.rho(subset(companydataRETS[,"RET"],companydataRETS$PERMNO == compns[i]))
I tested it for a subset of my data and it all works. The problem is that I have 72 million observations, and even after exiting the computer that was working overnight, it still wasn't over.
R, , , ( , -) (, apply with, ) ). ?