I am trying to index the data by their probability (estimated using a simple histogram). The goal is to select items in a series with a probability that is less than a threshold value.
I have a number of integer values, for example:
import pandas as pnd
import numpy as np
series = pnd.Series(np.random.poisson(5, size = 100))
then I calculated their histogram as follows:
tmp = {"series" : series, "count" : np.ones(len(series))}
hist = pnd.DataFrame(tmp).groupby("series").sum()
freq = hist / hist.sum()
So now I have the frequencies of each result indexed by the result, and a series of results. I have two questions:
- Is there a way to index
seriesby displaying the result / frequency being determined freq? - If I manage to do this, how can I select only results with a frequency greater than some value?
Thank.
source
share