How to replace and add a dataframe element with another data framework in Python Pandas?

Suppose I have two data frames 'df_a' and 'df_b', both have the same index structure and columns, but some of the internal data elements are different:

>>> df_a
           sales cogs
STK_ID QT           
000876 1   100  100
       2   100  100
       3   100  100
       4   100  100
       5   100  100
       6   100  100
       7   100  100

>>> df_b
           sales cogs
STK_ID QT           
000876 5    50   50
       6    50   50
       7    50   50
       8    50   50
       9    50   50
       10   50   50

And now I want to replace the df_a element with the df_b element, which has the same (index, column coordinate) and joins the df_b elements whose coordinate (index, column) is outside the scope of df_a. Same as adding patch 'df_b' to 'df_a':

>>> df_c = patch(df_a,df_b)
           sales cogs
STK_ID QT           
000876 1   100  100
       2   100  100
       3   100  100
       4   100  100
       5    50   50
       6    50   50
       7    50   50
       8    50   50
       9    50   50
       10   50   50

How to write the function 'patch (df_a, df_b)?

+5
source share
4 answers

Like BrenBarn, but with more flexibility:

# reindex both to union of indices
df_ar = df_a.reindex(df_a.index | df_b.index)
df_br = df_b.reindex(df_a.index | df_b.index)

# replacement criteria can be put in this lambda function
combiner = lambda: x, y: np.where(y < x, y, x)
df_c = df_ar.combine(df.br, combiner)
+1
source

Try the following:

df_c = df_a.reindex(df_a.index | df_b.index)
df_c.ix[df_b.index] = df_b
+2
source

( ) , df.combine_first() .

In [34]: df_b.combine_first(df_a)
Out[34]: 
           sales  cogs
STK_ID QT             
000876 1     100   100
       2     100   100
       3     100   100
       4     100   100
       5      50    50
       6      50    50
       7      50    50
       8      50    50
       9      50    50
       10     50    50
+2

I struggled with the same problem, the code in previous answers did not work in my data files. They have 2 index columns, and the reindex operation results in NaN values ​​in strange places (I will publish the contents of the data if someone wants to debug it).

I found an alternative solution. I bring this topic to life, hoping it may be useful to others:

# concatenate df_a and df_b
df_c = concat([dfbd,dfplanilhas])

# clears the indexes (turns the index columns into regular dataframe columns)
df_c.reset_index(inplace='True')

# removes duplicates keeping the last occurence (hence updating df_a with values from df_b)
df_c.drop_duplicates(subset=['df_a','df_b'], take_last='True', inplace='True')

Not a very elegant solution, but it seems to work.

Hope df.update gets the join = 'external' parameter soon ...

0
source

All Articles