The number of unique communications on a variable in the data table

I read in a large data file in R using the following command

data <- as.data.set(spss.system.file(paste(path, file, sep = '/')))

A dataset contains columns that should not belong and contain only spaces. This problem is related to the fact that R creates new variables based on variable labels attached to the SPSS file ( Source ).

Unfortunately, I could not determine the parameters needed to solve the problem. I tried everything: foreign :: read.spss, memisc: spss.system.file and Hemisc :: spss.get, with no luck.

Instead, I would like to read in the entire dataset (with ghost columns) and delete the unnecessary variables manually. Since ghost columns only contain spaces, I would like to remove any variables from my data table. Where the number of unique observations is equal to unity.

My data is big, so it is stored in data.table format. I would like to define an easy way to check the number of unique cases in each column and discard columns containing only one unique case.

require(data.table)

### Create a data.table
dt <- data.table(a = 1:10,
                 b = letters[1:10],
                 c = rep(1, times = 10))

### Create a comparable data.frame
df <- data.frame(dt)

### Expected result
unique(dt$a)

### Expected result
length(unique(dt$a))

However, I want to calculate the number of obs for a large data file, so a link to each column by name is not required. I am not a fan of eval (parse ()).

### I want to determine the number of unique obs in
  # each variable, for a large list of vars
lapply(names(df), function(x) {
    length(unique(df[, x]))
})

### Unexpected result
length(unique(dt[, 'a', with = F]))  # Returns 1

I think the problem is that

dt[, 'a', with = F]

"data.table". , 1, data.table, 1 . , data.frames - , 1.

- , , data.frame:

for (x in names(data)) {
  unique.obs <- length(unique(data[, x]))
  if (unique.obs == 1) {
    data[, x] <- NULL
  }
}

, , . , , , data.table .

+5
4

: uniqueN

1.9.6, , uniqueN. , :

dt[ , lapply(.SD, uniqueN)]

, -

 dt[, lapply(.SD, function(x) length(unique(x)))]
##     a  b c
## 1: 10 10 1

, with=FALSE [.data.table [[ ( fortune(312), ...)

lapply(names(df) function(x) length(unique(dt[, x, with = FALSE])))

 lapply(names(df) function(x) length(unique(dt[[x]])))

dt[,names(dt) := lapply(.SD, function(x) if(length(unique(x)) ==1) {return(NULL)} else{return(x)})]


 # or to avoid calling `.SD` 

dt[, Filter(names(dt), f = function(x) length(unique(dt[[x]]))==1) := NULL]
+7

. , :

for (i in names(DT)) if (length(unique(DT[[i]]))==1) DT[,(i):=NULL]

:

for (i in ncol(DT):1) if (length(unique(DT[[i]]))==1) DT[,(i):=NULL]

NB: (i) LHS := - i, "i".

+3

(, ).

require(data.table)

### Create a data.table
dt <- data.table(a = 1:10,
                 b = letters[1:10],
                 d1 = "",
                 c = rep(1, times = 10),
                 d2 = "")
dt
     a b d1 c d2
 1:  1 a    1   
 2:  2 b    1   
 3:  3 c    1   
 4:  4 d    1   
 5:  5 e    1   
 6:  6 f    1   
 7:  7 g    1   
 8:  8 h    1   
 9:  9 i    1   
10: 10 j    1   

d1 d2, . , , ? , dt.

only_space <- function(x) {
  length(unique(x))==1 && x[1]==""
}
bolCols <- apply(dt, 2, only_space)
dt[, (1:ncol(dt))[!bolCols], with=FALSE]

- , ...

:

     a b c
 1:  1 a 1
 2:  2 b 1
 3:  3 c 1
 4:  4 d 1
 5:  5 e 1
 6:  6 f 1
 7:  7 g 1
 8:  8 h 1
 9:  9 i 1
10: 10 j 1
+1

dplyr, select :

(dplyr)

newdata < - select (old_data, , )

, , .

, .

,

Fadhah

0

All Articles