Python is the best data structure for an incredibly large matrix

I need to create about 2 million vectors w / 1000 slots in each (each slot just contains an integer).

What would be the best data structure to work with this amount of data? Maybe I overestimate the amount of processed / memory.

I need to iterate over a set of files (about 34.5 GB in total) and update the vectors every time one of the two million elements (each corresponds to a vector) occurs in a line.

I could easily write code for this, but I know that it will not be optimal enough to handle the amount of data, so I ask you experts. :)

Best, Georgina

+3
source share
3 answers

. :

a = numpy.zeros((1000000,1000),dtype=int)

. , , , , numpy (scipy ).

, hdf5 h5py pytables netcdf4 netcdf4-python , .

+5

, , : 0.

+1
+1

All Articles