R: aggregate with column function

I would like to aggregate a data frame by time interval, applying different functions to each column. I think that I have almost no aggregatedown, and I divided my data into intervals with a packet chron, which was quite simple.

But I'm not sure how to handle the subsets. All display functions *apply, *plytake one function (I was hoping for something that took vector functions for use in the column or -variable, but did not find it), so I'm writing a function that takes a subset of the data frames, and gives me the average for all variables other than "time", which is the index, and "Runoff," which should be the sum.

I tried this:

aggregate(d., list(Time=trunc(d.$time, "00:10:00")), function (dat) with(dat, 
list(Time=time[1], mean(Port.1), mean(Port.1.1), mean(Port.2), mean(Port.2.1), 
mean(Port.3), mean(Port.3.1), mean(Port.4), mean(Port.4.1), Runoff=sum(Port.5))))

which would be ugly enough even if he did not give me this error:

Error in eval(substitute(expr), data, enclos = parent.frame()) : 
  not that many frames on the stack

which tells me that I'm really doing something wrong. From what I saw in R, I think there should be an elegant way to do this, but what is it?

dput:

d. <- structure(list(time = structure(c(15030.5520833333, 15030.5555555556, 
15030.5590277778, 15030.5625, 15030.5659722222), format = structure(c("m/d/y", 
"h:m:s"), .Names = c("dates", "times")), origin = structure(c(1, 
1, 1970), .Names = c("month", "day", "year")), class = c("chron", 
"dates", "times")), Port.1 = c(0.359747, 0.418139, 0.417459, 
0.418139, 0.417459), Port.1.1 = c(1.3, 11.8, 11.9, 12, 12.1), 
    Port.2 = c(0.288837, 0.335544, 0.335544, 0.335544, 0.335544
    ), Port.2.1 = c(2.3, 13, 13.2, 13.3, 13.4), Port.3 = c(0.253942, 
    0.358257, 0.358257, 0.358257, 0.359002), Port.3.1 = c(2, 
    12.6, 12.7, 12.9, 13.1), Port.4 = c(0.352269, 0.410609, 0.410609, 
    0.410609, 0.410609), Port.4.1 = c(5.9, 17.5, 17.6, 17.7, 
    17.9), Port.5 = c(0L, 0L, 0L, 0L, 0L)), .Names = c("time", 
"Port.1", "Port.1.1", "Port.2", "Port.2.1", "Port.3", "Port.3.1", 
"Port.4", "Port.4.1", "Port.5"), row.names = c(NA, 5L), class = "data.frame")
+5
source share
3 answers

There are a lot of things wrong with your approach. The general advice is not to go directly to what, in your opinion, should look like the last statement, but to work with increments, otherwise it is quite difficult to debug (understand and correct errors).

For example, you could start with:

aggregate(d., list(Time=trunc(d.$time, "00:10:00")), identity)

to notice that something is wrong with your split variable. He apparently aggregatedoes not like working with this data class. You can fix this problem by converting Timeto numeric:

aggregate(d., list(Time=as.numeric(trunc(d.$time, "00:10:00"))), identity)

aggregate(d., list(Time=as.numeric(trunc(d.$time, "00:10:00"))), apply.fun)

apply.fun - . ,

aggregate(d., list(Time=as.numeric(trunc(d.$time, "00:10:00"))), print)

, FUN aggregate ( data.frame), ( ), , aggregate.

ddply plyr. , , data.frame, - :

apply.fun <- function(dat) with(dat, data.frame(Time=time[1],
                                                mean(Port.1),
                                                mean(Port.1.1),
                                                mean(Port.2),
                                                mean(Port.2.1),
                                                mean(Port.3),
                                                mean(Port.3.1),
                                                mean(Port.4),
                                                mean(Port.4.1),
                                                Runoff=sum(Port.5)))

d.$Time <- as.numeric(trunc(d.$time, "00:10:00"))
library(plyr)
ddply(d., "Time", apply.fun)

#            Time mean.Port.1. mean.Port.1.1. mean.Port.2. mean.Port.2.1.
# 1 15030.5520833    0.4061886           9.82    0.3262026          11.04
#   mean.Port.3. mean.Port.3.1. mean.Port.4. mean.Port.4.1. Runoff
# 1     0.337543          10.66     0.398941          15.32      0

: @roysc , :

apply.fun <- function(dat) {
  out <- as.data.frame(lapply(dat, mean))
  out$Time <- dat$time[1]
  out$Runoff <- sum(dat$Port.5)
  return(out)
}
+8

by aggregate.

f , , list data.frame, f <- function(dat) with(dat, data.frame(...whatever...)) :

d.by <- by(d., list(Time = trunc(d.$time, "00:10:00")), f)
d.rbind <- do.call("rbind", d.by) # bind rows together

# fix up row and column names
rownames(d.rbind) <- NULL
colnames(d.rbind) <- colnames(d.)

, , f , Time.

+5

How about this?

library(plyr)
ddply(d., .(time), colMeans)
+1
source

All Articles