Pandas: bar chart index data

I am trying to index the data by their probability (estimated using a simple histogram). The goal is to select items in a series with a probability that is less than a threshold value.

I have a number of integer values, for example:

import pandas as pnd
import numpy  as np

series = pnd.Series(np.random.poisson(5, size = 100))

then I calculated their histogram as follows:

tmp  = {"series" : series, "count" : np.ones(len(series))}
hist = pnd.DataFrame(tmp).groupby("series").sum()
freq = hist / hist.sum()

So now I have the frequencies of each result indexed by the result, and a series of results. I have two questions:

  • Is there a way to index seriesby displaying the result / frequency being determined freq?
  • If I manage to do this, how can I select only results with a frequency greater than some value?

Thank.

+3
source share
1 answer

Yes, use the mapSeries method :

In [16]: series.map(freq['count'])
Out[16]: 
0     0.12
1     0.06
2     0.20
3     0.11
4     0.02
5     0.13
6     0.14
7     0.11
8     0.12
9     0.16
10    0.20
<snip>

You can:

In [22]: series[series.map(freq['count']) > 0.16]
Out[22]: 
2     4
10    4
11    4
22    4
27    4
31    4
34    4
56    4
64    4
71    4
73    4
76    4
77    4
79    4
80    4
86    4
88    4
89    4
91    4
99    4
+3
source

All Articles