I need to “double” some data in a data frame in R

I have a data frame and I would like to binarize each data point in the first 56 columns, provided that if the value is greater than 0, then it becomes 1, otherwise it is 0. Is there an easy way to do this?

+5
source share
3 answers

Using vectorized ifelse, you can:

   m[,1:56] <- ifelse(m[,1:56] > 0,1,0)

For example, we can check this in a small matrix:

 m <- matrix(sample(c(-2,2),5*3,rep=T),ncol=5,nrow=3,byrow=T)
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    2    2    2   -2
[2,]    2    2   -2    2   -2
[3,]    2    2    2    2    2
> m[,2:5] <- ifelse(m[,2:5] > 0,1,0)
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    1    1    1    0
[2,]    2    1    0    1    0
[3,]    2    1    1    1    1
+8
source

You can use the fact that TRUEthey are FALSEequated to "1" and "0" and do:

set.seed(1)
mydf <- data.frame(matrix(rnorm(100), nrow = 10))
mydf[, 1:5] <- (mydf[, 1:5] > 0) + 0
mydf
#    X1 X2 X3 X4 X5         X6          X7           X8         X9        X10
# 1   0  1  1  1  0  0.3981059  2.40161776  0.475509529 -0.5686687 -0.5425200
# 2   1  1  1  0  0 -0.6120264 -0.03924000 -0.709946431 -0.1351786  1.2078678
# 3   0  0  1  1  1  0.3411197  0.68973936  0.610726353  1.1780870  1.1604026
# 4   1  0  0  0  1 -1.1293631  0.02800216 -0.934097632 -1.5235668  0.7002136
# 5   1  1  1  0  0  1.4330237 -0.74327321 -1.253633400  0.5939462  1.5868335
# 6   0  0  0  0  0  1.9803999  0.18879230  0.291446236  0.3329504  0.5584864
# 7   1  0  0  0  1 -0.3672215 -1.80495863 -0.443291873  1.0630998 -1.2765922
# 8   1  1  0  0  1 -1.0441346  1.46555486  0.001105352 -0.3041839 -0.5732654
# 9   1  1  0  1  0  0.5697196  0.15325334  0.074341324  0.3700188 -1.2246126
# 10  0  1  1  1  1 -0.1350546  2.17261167 -0.589520946  0.2670988 -0.4734006

+0 , TRUE FALSE . as.numeric(mydf > 0), . ( @Dason).

mydf[, 1:5] <- as.numeric(mydf[, 1:5] > 0)
+5

The approach using pminand pmax. (Not recommended)

pmin(pmax(m[,2:5], 0),1)

But this allows you to add some benchmarking.

ag <- function() ifelse(m[,2:5] > 0,1,0)
mn <- function()pmin(pmax(m[,2:5], 0),1)
am <- function()  (m[, 2:5] > 0) + 0
am2 <- function()  as.numeric((m[, 2:5] > 0))

library(microbenchmark)
microbenchmark(ag(),mn(), am(), am2())
## Unit: microseconds
##   expr    min     lq  median      uq     max neval
##   ag() 19.888 20.712 21.9375 22.6430  39.548   100
##   mn() 50.135 51.172 52.2530 53.1055 113.854   100
##   am()  3.076  3.406  4.1755  4.6030   7.912   100
##  am2()  2.623  2.989  3.4640  4.0135   6.995   100

@ AnandaMahto solutions are clear winners, and the approach is as.numericeven faster!

+1
source

All Articles