Consider a list of x / y coordinates and a byte counter. x / y will have a range from 0 to 5000, which is 25 million cells.
However, the data will be quite sparsely populated, there will be no more than a few thousand records, and most coordinates will have zero records.
The structure will sometimes be viewed / added (for example, if there is something in x = 5 and y = 10, then ++), but more often it will be converted to the list x / y / count (sorting is not important)
The fastest data structure to look up is obviously a 2d array, but you are looking at 24 MB of memory or so, and iterating to list out can be expensive. For disk storage, you can implement gif-style compression, where 0 bytes followed by another byte indicates x empty cells, and everything else is the cell value, but this does not help the memory situation.
Dictionary dictionary is likely to be a good balance between search and iteration speed and memory usage.
Are there any other suitable data structures that I should consider (either Python built-in, existing libraries, or more general data structures?
source
share