Fast “Update Writing” to Binary Files?

I have 3,000 binary files (each 40 [MB] in size) of a known format (5,000,000 "entries" from "int32, float32"). they were created using the numpytofile () method .

The method that I use, WhichShouldBeUpdated()determines which file (out of 3000) should be updated, as well as which entries in this file should be changed. The method output is as follows:

(1) path_to_file_name_to_update

(2) an array of numpy Nrecords with records ( N- the number of updated records) in the following format:[(recordID1, newIntValue1, newFloatValue1), (recordID2, newIntValue2, newFloatValue2), .....]

As you can see:

(1) update file is known only at runtime

(2) update records are also known only at runtime

What would be the most efficient approach to updating a file with new values ​​for records?

+3
source share
2 answers

Since the recordings have a fixed length, you can simply open the file seekat a position that is a multiple of the recording size and recording offset. To encode int and float as binary you can use struct.pack. Update . Given that files are initially generated using numpy, the fastest way might be numpy.memmap.

+7
source

, , HDF5 pytables . HDF5 , .

+1