I need to calculate the weighted funds per line (6M + lines), but it takes a lot of time. The column with weights is a character field, so weighted.mean cannot be used directly.
Background data:
library(data.table)
library(stringr)
values <- c(1,2,3,4)
grp <- c("a", "a", "b", "b")
weights <- c("{10,0,0,0}", "{0,10,0,0}", "{10,10,0,0}", "{0,0,10,0}")
DF <- data.frame(cbind(grp, weights))
DT <- data.table(DF)
string.weighted.mean <- function(weights.x) {
tmp.1 <- na.omit(as.numeric(unlist(str_split(string=weights.x, pattern="[^0-9]+"))))
tmp.2 <- weighted.mean(x=values, w=tmp.1)
}
Here's how to do it (too slowly) with data.frames:
DF$wm <- mapply(string.weighted.mean, DF$weights)
This runs, but too slowly (hours):
DT[, wm:=mapply(string.weighted.mean, weights)]
How can I rephrase the last line to speed things up?