How to handle list columns in data.table

In the process of data merging, I often end up with columns of lists (for example, a row in the left table has several matches in the right table)

define

DT = data.table(x=list(c(1,2),c(3,4,5)),y=list(c(T,T),c(T,F,T)),z=c(1,2),N=c(1L,2L))
#       x               y z N
#1:   1,2       TRUE,TRUE 1 1
#2: 3,4,5 TRUE,FALSE,TRUE 2 2
  • Is it possible to change aka update xin place x[y]?

I can do this, but not update (and look ugly) like:

DT1 = DT[,list(x=list(unlist(x)[unlist(y)])),by=N]
DT = cbind(DT[,x:=NULL],DT1[,list(x)])
                 y z N   x
1:       TRUE,TRUE 1 1 1,2
2: TRUE,FALSE,TRUE 2 2 3,5

Now suppose I define mySet = c(1,5)and want to verify that the column valuesx %in% mySet

  • How can i do this?

                     y z N   x isInMySet
    1:       TRUE,TRUE 1 1 1,2 TRUE,FALSE
    2: TRUE,FALSE,TRUE 2 2 3,5 FASLE,TRUE
    
+2
source share
2 answers

I wrote the answer to your previous question to understand that you deleted the question. Here's how you can update (answer for your first part).

DT[, x := list(list(unlist(x)[unlist(y)])), by=N]

#      x               y z N
# 1: 1,2       TRUE,TRUE 1 1
# 2: 3,5 TRUE,FALSE,TRUE 2 2

And for your second part:

DT[, isInMySet := list(list(unlist(x) %in% mySet)), by=N]

#      x               y z N  isInMySet
# 1: 1,2       TRUE,TRUE 1 1 TRUE,FALSE
# 2: 3,5 TRUE,FALSE,TRUE 2 2 FALSE,TRUE

(or alternatively)

DT[, isInMySet := lapply(x, function(x) x %in% mySet)]
+2
source

Another approach:

DT
       x               y z N
1:   1,2       TRUE,TRUE 1 1
2: 3,4,5 TRUE,FALSE,TRUE 2 2

DT[,x2:=mapply(`[`,x,y,SIMPLIFY=FALSE)]
DT
       x               y z N  x2
1:   1,2       TRUE,TRUE 1 1 1,2
2: 3,4,5 TRUE,FALSE,TRUE 2 2 3,5

DT[,isInMySet:=lapply(x2,`%in%`,c(1,5))]
DT
       x               y z N  x2  isInMySet
1:   1,2       TRUE,TRUE 1 1 1,2 TRUE,FALSE
2: 3,4,5 TRUE,FALSE,TRUE 2 2 3,5 FALSE,TRUE
+3
source

All Articles