Combating row grouping in a data frame

Question

Combating row grouping in a data frame

I have such a data frame

product_id view_count purchase_count
1           11         1   
2           20         3
3           5          2
...

I would like to convert this to a table that groups by view_count and sums buy_count for the interval, for example.

view_count_range total_purchase_count
0-10                 45
10-20                65

These view_count_ranges will have a fixed size. I would appreciate any suggestions on how to group such ranges.

+3

r dataframe data.table

Jeff May 15, '12 at 1:35

source share

2 answers

By expanding Tyler’s answer and starting with his example dat, it may be easier and faster for you to write such queries in data.table:

> require(data.table)
> DT = as.data.table(dat)

> DT[, sum(purchase_count), by=cut(view_count,c(0,10,20))]
         cut V1
[1,] (10,20] 31
[2,]  (0,10] 39

What is it. Just one line. Easy to read, easy to read.

, (10,20], , ( view_count 11 )., by keyby:

> DT[, sum(purchase_count), keyby=cut(view_count,c(0,10,20))]
         cut V1
[1,]  (0,10] 39
[2,] (10,20] 31

:

> DT[,list( purchase_count = sum(purchase_count) ),
     keyby=list( view_count_range = cut(view_count,c(0,10,20) ))]
     view_count_range purchase_count
[1,]           (0,10]             39
[2,]          (10,20]             31

+2

Matt Dowle 15 '12 8:49

Tyler rinker · Accepted Answer · 2012-05-15T01:51:41+0000

cut- A convenient tool for this kind of thing. here is one way:

#First make some data to work with 
#I suggest you do this in the future as it makes it 
#easier to provide you with assistance.
set.seed(10)
dat <- data.frame(product_id=1:15, view_count=sample(1:20, 15, replace=T), 
    purchase_count=sample(1:8, 15, replace=T))
dat   #look at the data

#now we can use cut and aggregate by this new variable we just created
dat$view_count_range <- with(dat, cut(view_count, c(0, 10, 20)))
aggregate(purchase_count~view_count_range, dat, sum)

What gives:

  view_count_range purchase_count
1           (0,10]             39
2          (10,20]             31

Combating row grouping in a data frame

More articles: