I read in a large data file in R using the following command
data <- as.data.set(spss.system.file(paste(path, file, sep = '/')))
A dataset contains columns that should not belong and contain only spaces. This problem is related to the fact that R creates new variables based on variable labels attached to the SPSS file ( Source ).
Unfortunately, I could not determine the parameters needed to solve the problem. I tried everything: foreign :: read.spss, memisc: spss.system.file and Hemisc :: spss.get, with no luck.
Instead, I would like to read in the entire dataset (with ghost columns) and delete the unnecessary variables manually. Since ghost columns only contain spaces, I would like to remove any variables from my data table. Where the number of unique observations is equal to unity.
My data is big, so it is stored in data.table format. I would like to define an easy way to check the number of unique cases in each column and discard columns containing only one unique case.
require(data.table)
#
dt <- data.table(a = 1:10,
b = letters[1:10],
c = rep(1, times = 10))
#
df <- data.frame(dt)
#
unique(dt$a)
#
length(unique(dt$a))
However, I want to calculate the number of obs for a large data file, so a link to each column by name is not required. I am not a fan of eval (parse ()).
lapply(names(df), function(x) {
length(unique(df[, x]))
})
length(unique(dt[, 'a', with = F]))
I think the problem is that
dt[, 'a', with = F]
"data.table". , 1, data.table, 1 . , data.frames - , 1.
- , , data.frame:
for (x in names(data)) {
unique.obs <- length(unique(data[, x]))
if (unique.obs == 1) {
data[, x] <- NULL
}
}
, , . , , , data.table .