NumPy lacks built-in support for 3-byte and 6-byte types, aka uint24and uint48. I have a large dataset using these types and want it loaded in numpy. What am I doing now (for uint24):
import numpy as np
dt = np.dtype([('head', '<u2'), ('data', '<u2', (3,))])
a = np.memmap("filename", mode='r', dtype=dt)
w1, w2, w3 = a['data'].swapaxes(0,1)
a2 = np.ndarray((2,a.size), dtype='u4')
a2[0] = w2 % 256
a2[0] <<= 16
a2[0] += w1
a2[1] = w3
a2[1] <<=8
a2[1] += w2 >> 8
While it works to enter 100 MB, it looks inefficient (think about 100 GB of data). Is there a more efficient way? For example, creating a special read-only view that masks part of the data would be useful (type type "uint64 with two types of MSB is always zero"). I only need read-only data access.
source
share