Empirical cumulative function update

I have the following problem:

Given the flow of observations, find the number of observations that are less than or equal to the last observation. For example, if streaming observations

8, 1, 10, 3, 9, 7, 4, 5, 6, 2

then we have the following updates

  • Observations - 8, there is 1 observation less than or equal to 8
  • Observations - 8, 1, 1 observation less than or equal to 1
  • Observations - 8, 1, 10, there are 3 observations less than or equal to 10
  • ...

The result would be such values

1, 1, 3, 2, 4, 3, 3, 4, 5, 2

The solution should be very quick as I work with a huge dataset.

+3
source share
3 answers

Using for, but in the opposite direction, I do not test, but I think that it is faster.

xx <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)
res = vector('integer',length=length(xx))
for (i in rev(seq_along(xx))) {
  res[i] <- sum(xx[i]>=xx)
  xx <- xx[-i]
}
res
[1] 1 1 3 2 4 3 3 4 5 2
+2

sapply:

vec <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)

sapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]))
# [1] 1 1 3 2 4 3 3 4 5 2

, vapply. ():

vapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]), integer(1))
# [1] 1 1 3 2 4 3 3 4 5 2
+1

So I couldn’t get away well enough, so I created kludgemonster

   carl<-function(vec) {
newct<-vector('integer',length=length(vec))
vlen<-length(vec)
for(j in 1:length(vec) ) {
    wins<- (which(vec[j:vlen] >= vec[j])+j-1)
    newct[wins]<-newct[wins]+1
}
}

It works, but ...

Rgames> set.seed(20)
Rgames> vec<-runif(2000)



 Rgames> microbenchmark(carl(vec),agstudy(vec),times=10)
Unit: milliseconds
         expr      min       lq   median       uq      max neval
    carl(vec) 86.75314 87.55323 88.16816 88.80831 89.65117    10
 agstudy(vec) 70.26213 70.83771 71.06158 71.72247 71.93800    1

Still not as good as agstudy code. Maybe someone can tighten my cycle?

0
source

All Articles