Create Excel-like SUMIFS in Pandas

Question

Create Excel-like SUMIFS in Pandas

I recently found out about pandasand was glad to see its analytics functionality. I am trying to convert Excel array functions to the Pandas equivalent to automate the tables I created to report performance attributes. In this example, I created a new column in Excel based on conditions in other columns:

={SUMIFS($F$10:$F$4518,$A$10:$A$4518,$C$4,$B$10:$B$4518,0,$C$10:$C$4518," ",$D$10:$D$4518,$D10,$E$10:$E$4518,$E10)}

The formula summarizes the values in the "F" array (safety weights) based on certain conditions. "A" (portfolio identifier) is a specific number, array "B" (security identifier) is zero, array "C" (group description) is ", array" D "(start date) is the date of the line that I am on, and the array "E" (end date) is the date of the line in which I am included.

In Pandas, I use a DataFrame. Creating a new column in a data frame with the first three conditions is straightforward, but it is difficult for me with the last two conditions.

reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE']
          [(reportAggregateDF['PORT_ID'] == portID) &
           (reportAggregateDF['SEC_ID'] == 0) &
           (reportAggregateDF['GROUP_LIST'] == " ") & 
           (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) & 
           (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum()

Obviously .ix [:] in the last two conditions does nothing for me, but is there a way to make the sum conditional for the line that I am on without a loop? My goal is not to do any cycles, but instead use purely vector operations.

+5

python pandas

Julio Guzman 13 . '12 10:29

2

guyrt · Answer 1 · 2013-08-20T03:31:48+0000

apply lambda:

>> df
     A    B    C    D     E
0  mitfx  0  200  300  0.25
1     gs  1  150  320  0.35
2    duk  1    5    2  0.45
3    bmo  1  145   65  0.65

, C E, B == 1 D 5:

df['matches'] = df.apply(lambda x: x['C'] * x['E'] if x['B'] == 1 and x['D'] > 5 else 0, axis=1)
df.matches.sum()

, :

df_subset = df[(df.B == 1) & (df.D > 5)]
df_subset.apply(lambda x: x.C * x.E, axis=1).sum()

:

df_subset = df[(df.B == 1) & (df.D > 5)]
print sum(df_subset.C * df_subset.E)

, .

Julio Guzman · Answer 2 · 2012-06-15T19:18:50+0000

, , :

for idx, eachRecord in reportAggregateDF.T.iteritems():
reportAggregateDF['PORT_WEIGHT'].ix[idx] = reportAggregateDF['SEC_WEIGHT_RATE'][(reportAggregateDF['PORT_ID'] == portID) &            
    (reportAggregateDF['SEC_ID'] == 0) &            
    (reportAggregateDF['GROUP_LIST'] == " ") &             
    (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[idx]) &             
    (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[idx])].sum()

Create Excel-like SUMIFS in Pandas

More articles: