I came across this problem in a recent interview:
You have a stream of incoming numbers in a range 0 to 60000, and you have a function that will take a number from this range and return the counter of the appearance of this number up to this point. Give a suitable data structure / algorithm for implementing this system.
My decision:
Make an array of size 60001 pointing to the bit vectors. These bit vectors will contain the number of incoming numbers, and the incoming numbers will also be used to index into the array for the corresponding number. Bit vectors will dynamically increase as the counter becomes too large to hold them.
So, if the numbers go with speed 100numbers/sec, then in 1 million years the total number will be = (100*3600*24)*365*1000000 = 3.2*10^15. In the worst case, when all the numbers in the stream are the same, it will take a value ceil((log(3.2*10^15) / log 2) )= 52bits, and if the numbers are evenly distributed, we will have the (3.2*10^15) / 60001 = 5.33*10^10number of occurrences for each number, which will require everything 36 bitsfor each number. So, if we assume that 4 byte pointers, we need (60001 * 4)/1024 = 234 KBmemory for the array, and for the case with the same numbers, we need a bit vector of size = 52/8 = 7.5 bytes, which is still about 234 KB. And for another case, we need (60001 * 36 / 8)/1024 = 263.7 KBfor the bit vector, only about 500 KB. Thus, it is very possible to do with a regular PC and memory.
, , , , , , , , .. , . , .
? ( , )?