The fastest way to create and populate a huge two-dimensional array?

Question

The fastest way to create and populate a huge two-dimensional array?

I need to create and fill in a huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case, which comes from mathematical formulas. The array will be calculated after.

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1

What is the fastest way to create and populate such a huge numpy array? filling the lists then aggregate and then convert to a numpy array?
Is it possible to parallelize the calculations, knowing that the cases / columns / rows of a 2d array are independent to speed up the filling of the array? Keys / routes for optimizing such calculations using Multiprocessing?

+5

python numpy matrix multidimensional-array multiprocessing

sol Apr 22 '13 at 16:22

source share

2 answers

sega_sai · Answer 1 · 2013-04-22T18:44:30+0000

, numpy, ( , ). , ( - stackoverflow, : fooobar.com/questions/82981/...)

import multiprocessing as mp ,numpy as np, ctypes

def shared_zeros(n1, n2):
    # create a 2D numpy array which can be then changed in different threads
    shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape(n1, n2)
    return shared_array

class singleton:
    arr = None

def dosomething(i):
    # do something with singleton.arr
    singleton.arr[i,:] = i
    return i

def main():
    singleton.arr=shared_zeros(1000,1000)
    pool = mp.Pool(16)
    pool.map(dosomething, range(1000))

if __name__=='__main__':
    main()

shx2 · Answer 2 · 2013-04-22T17:32:14+0000

numpy.memmap , multiprocessing.Pool . .

The fastest way to create and populate a huge two-dimensional array?

More articles: