How to apply linregress in a Pandas group

I would like to apply scipy.stats.linregress inside Pandas ByGroup. I looked through the documentation, but all I could see was how to apply something to one column, like

grouped.agg(np.sum)

or type function

grouped.agg('D' : lambda x: np.std(x, ddof=1)) 

But how do I apply linregress, which has two inputs, X and Y?

+5
source share
1 answer

The function linregress, as well as many other scipy / numpy functions, accepts “massive” X and Y, both series and DataFrame can be qualified.

For instance:

from scipy.stats import linregress
X = pd.Series(np.arange(10))
Y = pd.Series(np.arange(10))

In [4]: linregress(X, Y)
Out[4]: (1.0, 0.0, 1.0, 4.3749999999999517e-80, 0.0)

In fact, the ability to use scipy (and numpy) functions is one of the pandas killer functions!

So, if you have a DataFrame, you can use linregress on your columns (which are Series):

linregress(df['col_X'], df['col_Y'])

, apply ( ):

grouped.apply(lambda x: linregress(x['col_X'], x['col_Y']))
+4

All Articles