Python data structure for sparse list of x / y coordinates

Consider a list of x / y coordinates and a byte counter. x / y will have a range from 0 to 5000, which is 25 million cells.

However, the data will be quite sparsely populated, there will be no more than a few thousand records, and most coordinates will have zero records.

The structure will sometimes be viewed / added (for example, if there is something in x = 5 and y = 10, then ++), but more often it will be converted to the list x / y / count (sorting is not important)

The fastest data structure to look up is obviously a 2d array, but you are looking at 24 MB of memory or so, and iterating to list out can be expensive. For disk storage, you can implement gif-style compression, where 0 bytes followed by another byte indicates x empty cells, and everything else is the cell value, but this does not help the memory situation.

Dictionary dictionary is likely to be a good balance between search and iteration speed and memory usage.

Are there any other suitable data structures that I should consider (either Python built-in, existing libraries, or more general data structures?

+3
source share
3 answers

, (.. 2-). O (1), , . , .

# increment
p = (x, y)
counts[p] = counts.get(p, 0) + 1

# list
for (p, count) in counts.iteritems():
    x, y = p
    print x, y, count
+5

scipy

:
csc_matrix:
csr_matrix:
bsr_matrix: | lil_matrix:
dok_matrix: | coo_matrix: COOrdinate ( IJV, )
dia_matrix: DIAgonal

+4

All Articles