I need to create and fill in a huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case, which comes from mathematical formulas. The array will be calculated after.
import itertools, operator, time, copy, os, sys
import numpy
from multiprocessing import Pool
def f2(x):
temp=[]
for i in combine:
temp.append(0.2*x[1]*i[1]/64.23)
return temp
def combinations_with_replacement_counts(n, r):
size = n + r - 1
for indices in itertools.combinations(range(size), n-1):
starts = [0] + [index+1 for index in indices]
stops = indices + (size,)
yield tuple(map(operator.sub, stops, starts))
global combine
combine = list(combinations_with_replacement_counts(3, 60))
print len(combine)
if __name__ == '__main__':
t1=time.time()
pool = Pool()
results = [pool.apply_async(f2, (x,)) for x in combine]
roots = [r.get() for r in results]
print roots [0:3]
pool.close()
pool.join()
print time.time()-t1
- What is the fastest way to create and populate such a huge numpy array? filling the lists then aggregate and then convert to a numpy array?
- Is it possible to parallelize the calculations, knowing that the cases / columns / rows of a 2d array are independent to speed up the filling of the array? Keys / routes for optimizing such calculations using Multiprocessing?
source
share