I have a data set from which I want to build the number of keys on a unique identifier counter (x = unique_id_count, y = key_count), and I'm trying to learn how to use it pandas.
In this case:
unique_ids 1 = number of keys 2
unique_ids 2 = number of keys 1
from pandas import *
key_items = ("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c")
id_data = ("X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "X", "X", "X")
df = DataFrame({'keys': key_items, 'ids': id_data})
I managed to map the data to what I want by pulling the data from the data framework and restructuring it, and rebuilding the new DataFrame. In this case, it's probably best to do all this in python without pandas ...
unique_values = defaultdict(list)
for items in df.itertuples(index=False):
key = items[1]
v = items[0]
unique_values[key].append(v)
unique_values_count = {}
for k, values in unique_values.iteritems():
unique_values_count[k] = [len(set(values))]
key_col = ("a", "b", "c")
id_col = [unique_values_count[k][0] for k in key_col]
df2 = DataFrame({"keys":key_col, "unique_id_count": id_col})
df2.groupby("unique_id_count").size().plot(kind="bar")
Is there a better way to do this more directly using the original frame?