Python is the best data structure for an incredibly large matrix

Question

Python is the best data structure for an incredibly large matrix

I need to create about 2 million vectors w / 1000 slots in each (each slot just contains an integer).

What would be the best data structure to work with this amount of data? Maybe I overestimate the amount of processed / memory.

I need to iterate over a set of files (about 34.5 GB in total) and update the vectors every time one of the two million elements (each corresponds to a vector) occurs in a line.

I could easily write code for this, but I know that it will not be optimal enough to handle the amount of data, so I ask you experts. :)

Best, Georgina

+3

python data-structures vector matrix

Georgina Mar 22 '11 at 21:04

source share

3 answers

, , : 0.

+1

Jeroen Dirks 22 . '11 21:13

, scipy.sparse matrix. .

+1

samplebias 22 . '11 21:20

JoshAdel · Accepted Answer · 2011-03-22T21:07:45+0000

. :

a = numpy.zeros((1000000,1000),dtype=int)

. , , , , numpy (scipy ).

, hdf5 h5py pytables netcdf4 netcdf4-python , .

Python is the best data structure for an incredibly large matrix

More articles: