I am a user of R and I cannot understand the equivalent of pandas matching (). I need to use this function to iterate over a bunch of files, grab a key piece of information and merge it back into the current data structure on the "url". In R, I would do something like this:
logActions <- read.csv("data/logactions.csv")
logActions$class <- NA
files = dir("data/textContentClassified/")
for( i in 1:length(files)){
tmp <- read.csv(files[i])
logActions$class[match(logActions$url, tmp$url)] <-
tmp$class[match(tmp$url, logActions$url)]
}
I don’t think I can use merge () or join (), since every time I will rewrite logActions $ class. I cannot use update () or comb_first () since it does not have the necessary indexing capabilities. I also tried to create a match () function based on this SO post , but I can't figure out how to make it work with DataFrame objects. I apologize if I miss something obvious.
python, - match() pandas:
from pandas import *
left = DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = DataFrame({'url': ['bar.com'], 'class': [ 1]})
left.join(right1, on='url')
merge(left, right, on='url')
left = left.combine_first(right1)
left = left.combine_first(right2)
left
left = left.set_index('url', drop=False)
right1 = right1.set_index('url', drop=False)
right2 = right2.set_index('url', drop=False)
left = left.combine_first(right1)
left = left.combine_first(right2)
left
:
url action class
0 foo.com 0 0
1 foo.com 1 0
2 bar.com 0 1
, , .