How to use R for several selected questions?

I am trying to figure out how to analyze multiple selection / multiple response requests (ie, โ€œselect all that applyโ€) in a survey I recently conducted.

SPSS has good capabilities for analyzing online survey data and these types of questions, so I assume that R has this and much more. Working with these polls is a bit complicated in Excel. For example, show me a histogram / distribution of everyone who loves strawberry and chocolate ice cream by age.

How to structure a data set and which commands will execute some basic tables of frequency, Pareto and logical AND functions?

+5
source share
3 answers

, , SPSS. , apply() . adply() plyr:

library(plyr)
set.seed(1)
#Fake data with three "like" questions. 0 = non selected, 1 = selected
dat <- data.frame(resp = 1:10,
                  like1 = sample(0:1, 10, TRUE),
                  like2 = sample(0:1, 10, TRUE),
                  like3 = sample(0:1, 10, TRUE)
                  )

adply(dat[grepl("like", colnames(dat))], 2, function(x)
  data.frame(Count = as.data.frame(table(x))[2,2], 
        Perc = as.data.frame(prop.table(table(x)))[2,2]))
#-----
     X1 Count Perc
1 like1     6  0.6
2 like2     5  0.5
3 like3     3  0.3
+5

. , .

set.seed(1)
dat <- data.frame(resp = 1:10,
                  like1 = sample(0:1, 10, TRUE),
                  like2 = sample(0:1, 10, TRUE),
                  like3 = sample(0:1, 10, TRUE))

:

multi.freq.table = function(data, sep="", dropzero=FALSE, clean=TRUE) {
  # Takes boolean multiple-response data and tabulates it according
  #   to the possible combinations of each variable.
  #
  # See: http://stackoverflow.com/q/11348391/1270695

  counts = data.frame(table(data))
  N = ncol(counts)
  counts$Combn = apply(counts[-N] == 1, 1, 
                       function(x) paste(names(counts[-N])[x],
                                         collapse=sep))
  if (isTRUE(dropzero)) {
    counts = counts[counts$Freq != 0, ]
  } else if (!isTRUE(dropzero)) {
    counts = counts
  }
  if (isTRUE(clean)) {
    counts = data.frame(Combn = counts$Combn, Freq = counts$Freq)
  } 
  counts
}

:

multi.freq.table(dat[-1], sep="-")
#               Combn Freq
# 1                      1
# 2             like1    2
# 3             like2    2
# 4       like1-like2    2
# 5             like3    1
# 6       like1-like3    1
# 7       like2-like3    0
# 8 like1-like2-like3    1

, ! , , , .

Update

SPSS , , . , , .

data.frame(Freq = colSums(dat[-1]),
           Pct.of.Resp = (colSums(dat[-1])/sum(dat[-1]))*100,
           Pct.of.Cases = (colSums(dat[-1])/nrow(dat[-1]))*100)
#       Freq Pct.of.Resp Pct.of.Cases
# like1    6    42.85714           60
# like2    5    35.71429           50
# like3    3    21.42857           30
+2
multfreqtable(data_set, "Banner")
multfreqtable = function(data, question.prefix) {
  z = length(question.prefix)
  temp = vector("list", z)

  for (i in 1:z) {
    a = grep(question.prefix[i], names(data))
    b = sum(data[, a] != 0)
    d = colSums(data[, a] != 0)
    e = sum(rowSums(data[,a]) !=0)
    f = as.numeric(c(d, b))
    temp[[i]] = data.frame(question = c(sub(question.prefix[i], 
                                            "", names(d)), "Total"),
                           freq = f,
                           percent_response = (f/b)*100,
                           percent_cases = (f/e)*100 )
    names(temp)[i] = question.prefix[i]
  }
  temp
}

does a very good job of providing you with numbers, percent with the number of cases and percent at the level of the number of answers. Ideal for analyzing multiple-answer questions.

+2
source

All Articles