I like the ability of plyr to split a data frame into multiple sets of data, and then perform the same operations on each set. The best part is when it shows the result as a neat compact well-marked table. I like to throw a bunch of calculations on one line using each (). However, I do not understand why using the generic function in the ddply argument interrupts the output and makes it long and unlabeled. Take a look here to understand what I mean. Can you tell me what I am doing wrong? I prefer to use generalization.
Let's first set up an example data frame. Imagine you had 60 study participants. 20 of them were funny, 20 were smart and 20 were good. Then, each subject received a rating.
type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)
Now I need a table showing the average mark, average mark, minimum mark and maximum mark for each of the three types of people.
ddply(data,.(type), summarise, each(mean,median,min,max)(score))
A good table should be indicated in the row above (3 rows - 1 for each type and 4 data columns). Alas, it gives a whole long table with one column of numbers, none of which are marked.
ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))
The above line gives me what I want. Can you explain what I don't understand about the ddply syntax.