I have a wide range of survey data. For a specific question in the source data, a set of variables was created to represent the different fact that the research question was asked in a particular month.
I want to create a new set of variables with monthly invariant names; the value of these variables will correspond to the value of the monthly version of the question for the observed month.
See example / dummy dataset:
require(data.table)
data <- data.table(month = rep(c('may', 'jun', 'jul'), each = 5),
may.q1 = rep(c('yes', 'no', 'yes'), each = 5),
jun.q1 = rep(c('breakfast', 'lunch', 'dinner'), each = 5),
jul.q1 = rep(c('oranges', 'apples', 'oranges'), each = 5),
may.q2 = rep(c('econ', 'math', 'science'), each = 5),
jun.q2 = rep(c('sunny', 'foggy', 'cloudy'), each = 5),
jul.q2 = rep(c('no rain', 'light mist', 'heavy rain'), each = 5))
There are only two questions in this survey: "q1" and "q2". Each of these questions has been repeatedly requested for several months. However, the observation contains a valid answer only if the month observed in the data coincides with the question of the survey for a specific month.
: "may.q1" "" "". , "Q1" "may.q1" , "jun.q1" "jul.q1". "Q1" "may.q1" , "", "Q1" "jun.q1", "jun",.
, , - :
mdata <- data[month == 'may', c('month', 'may.q1', 'may.q2'), with = F]
setnames(mdata, names(mdata), gsub('may\\.', '', names(mdata)))
, "by = month".
"plyr" , :
require(plyr)
data <- data.frame(data)
mdata <- ddply(data, .(month), function(dfmo) {
dfmo <- dfmo[, c(1, grep(dfmo$month[1], names(dfmo)))]
names(dfmo) <- gsub(paste0(dfmo$month[1], '\\.'), '', names(dfmo))
return(dfmo)
})
, data.table, , . .