I need to create about 2 million vectors w / 1000 slots in each (each slot just contains an integer).
What would be the best data structure to work with this amount of data? Maybe I overestimate the amount of processed / memory.
I need to iterate over a set of files (about 34.5 GB in total) and update the vectors every time one of the two million elements (each corresponds to a vector) occurs in a line.
I could easily write code for this, but I know that it will not be optimal enough to handle the amount of data, so I ask you experts. :)
Best, Georgina
. :
a = numpy.zeros((1000000,1000),dtype=int)
. , , , , numpy (scipy ).
numpy
scipy
, hdf5 h5py pytables netcdf4 netcdf4-python , .
hdf5
h5py
pytables
netcdf4
netcdf4-python
, , : 0.
, scipy.sparse matrix. .