Suppose I have data.frame:
a <- c(1,2,3,4,5)
b <- c(0,1,NA,3,4)
c <- c(9,10,11,NA,13)
df <- data.frame(a,b,c)
I managed to write a user-defined function that I can use to summarize certain variables line by line, ignoring NA (in this case, I summarize all the variables, but imagine a large data.frame file where I only need to add a few variables):
sum.df.na.rm <- function(x) {
rowSums(df[,x], na.rm = TRUE)
}
df$d <- sum.df.na.rm(c("a","b","c"))
> df
a b c d
1 0 9 10
2 1 10 13
3 NA 11 14
4 3 NA 7
5 4 13 22
Now suppose I want to subtract b from a and add c, but still ignoring NA. I can do:
df$bneg <- df$b * (-1)
df$e <- sum.df.na.rm(c("a","bneg","c"))
> df
a b c d bneg e
1 0 9 10 0 10
2 1 10 13 -1 11
3 NA 11 14 NA 14
4 3 NA 7 -3 1
5 4 13 22 -4 14
But in order to multiply b by (-1) so that it is subtracted in the function sum.df.na.rm, it seems to me very inefficient.
How would you do this without using the bneg intermediate variable?