Row counter by the number of unique values in pandas

Question

Row counter by the number of unique values in pandas

I have a data set from which I want to build the number of keys on a unique identifier counter (x = unique_id_count, y = key_count), and I'm trying to learn how to use it pandas.

In this case:

unique_ids 1 = number of keys 2

unique_ids 2 = number of keys 1

from pandas import *
key_items = ("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c")
id_data = ("X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "X", "X", "X")

df = DataFrame({'keys': key_items, 'ids': id_data})

I managed to map the data to what I want by pulling the data from the data framework and restructuring it, and rebuilding the new DataFrame. In this case, it's probably best to do all this in python without pandas ...

unique_values = defaultdict(list)
for items in df.itertuples(index=False):
    key = items[1]
    v = items[0]
    unique_values[key].append(v)

unique_values_count = {}
for k, values in unique_values.iteritems():
    unique_values_count[k] = [len(set(values))]

# reformat for plotting
key_col = ("a", "b", "c")
id_col = [unique_values_count[k][0] for k in key_col]



df2 = DataFrame({"keys":key_col, "unique_id_count": id_col})
df2.groupby("unique_id_count").size().plot(kind="bar")

Is there a better way to do this more directly using the original frame?

+5

python pandas plot

monkut Feb 28 '13 at 3:00

source share

2 answers

How to use directly value_counts()

pd.value_counts(df['ids']).plot.bar()

+4

Aziz alto Jul 12 '17 at 17:00

source share

Hyry · Accepted Answer · 2013-02-28T03:25:20+0000

s = df.groupby("keys").ids.agg(lambda x:len(x.unique()))
pd.value_counts(s).plot(kind="bar")

Row counter by the number of unique values ​​in pandas

More articles:

Row counter by the number of unique values in pandas