I recently found out about pandasand was glad to see its analytics functionality. I am trying to convert Excel array functions to the Pandas equivalent to automate the tables I created to report performance attributes. In this example, I created a new column in Excel based on conditions in other columns:
={SUMIFS($F$10:$F$4518,$A$10:$A$4518,$C$4,$B$10:$B$4518,0,$C$10:$C$4518," ",$D$10:$D$4518,$D10,$E$10:$E$4518,$E10)}
The formula summarizes the values in the "F" array (safety weights) based on certain conditions. "A" (portfolio identifier) is a specific number, array "B" (security identifier) is zero, array "C" (group description) is ", array" D "(start date) is the date of the line that I am on, and the array "E" (end date) is the date of the line in which I am included.
In Pandas, I use a DataFrame. Creating a new column in a data frame with the first three conditions is straightforward, but it is difficult for me with the last two conditions.
reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE']
[(reportAggregateDF['PORT_ID'] == portID) &
(reportAggregateDF['SEC_ID'] == 0) &
(reportAggregateDF['GROUP_LIST'] == " ") &
(reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) &
(reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum()
Obviously .ix [:] in the last two conditions does nothing for me, but is there a way to make the sum conditional for the line that I am on without a loop? My goal is not to do any cycles, but instead use purely vector operations.