Attaching one data frame to another by membership in a range

I have two data frames that look something like this:

df1 <- data.frame(time=seq(0.0, by = 0.003, length.out = 1000))

   time
1 0.000
2 0.003
3 0.006
4 0.009
5 0.012
6 0.015
...

df2 <- data.frame(onset=c(0.0, 0.8, 1.9, 2.4), offset=c(0.799, 1.899, 2.399, 3.0))

  onset offset   A   B
1   0.0  0.799 ... ...
2   0.8  1.899 ... ...
3   1.9  2.399 ... ...
4   2.4  3.000 ... ...

In fact, each of the data frames has more columns, and in the first data frame there are many times, and they are not located on a regular basis; there are not many rows in the second data frame. I want to combine two data frames so that each row in the first data frame receives additional columns for the corresponding range from the second data frame, and I want to do this efficiently because hundreds of thousands of rows are involved.

+3
source share
3 answers

You can use findIntervalfor comparison with the corresponding time onset, and then mergeyour two data.frames file:

df1$onset <- df2$onset[findInterval(df1$time, df2$onset)]
df3 <- merge(df1, df2, by = "onset")

head(df3)
#   onset  time offset
# 1     0 0.000  0.799
# 2     0 0.003  0.799
# 3     0 0.006  0.799
# 4     0 0.009  0.799
# 5     0 0.012  0.799
# 6     0 0.015  0.799

tail(df3)
#      onset  time offset
# 995    2.4 2.982      3
# 996    2.4 2.985      3
# 997    2.4 2.988      3
# 998    2.4 2.991      3
# 999    2.4 2.994      3
# 1000   2.4 2.997      3
+5

, plyr:: join:

# breaks for 'cut'
times=c(df2$onset[1],df2$offset)

# modified df1 to shorten the list
df1 <- data.frame(time=seq(0.0, by = 0.03, length.out = 100))

# Add a few columns to df2
df2 <- data.frame(onset=c(0.0, 0.8, 1.9, 2.4), offset=c(0.799, 1.899, 2.399, 3.0), A=c(1,2,3,4), B=c(5,6,7,8))


df2$ranges <-cut(df2$onset,times,include.lowest=T))
df1$ranges <-cut(df1$time,times,include.lowest=T,levels=levels(df2$ranges))

join(df1,df2,by='ranges')

head(join(df1,df2,by='ranges')[-2])
  time onset offset A B
1 0.00     0  0.799 1 5
2 0.03     0  0.799 1 5
3 0.06     0  0.799 1 5
4 0.09     0  0.799 1 5
5 0.12     0  0.799 1 5
6 0.15     0  0.799 1 5
+2

Third option using sqldfto executeconditional join

> head(sqldf("select * 
+            from df1 inner join df2 
+                     on (df1.time between df2.onset and df2.offset)"))

Head output:

   time onset offset
1 0.000     0  0.799
2 0.003     0  0.799
3 0.006     0  0.799
4 0.009     0  0.799
5 0.012     0  0.799
6 0.015     0  0.799

The inner join will get rid of time that does not match between the range in df2. If you want to keep these times and have zeros at the beginning and at the end, just left joininstead of inner joinin SQL in the function sqldfabove.

0
source

All Articles