Current process:
- I have a file
tar.gz. (In fact, I have about 2,000 of them, but this is another story). - I create a temporary directory, extract the file
tar.gz, opening 100,000 tiny files (about 600 bytes each). - For each file, I drop it into a processing program, process this cycle in another analysis program, and save the result.
The temporary space on the machines that I use can process only one of these processes at once, not to mention the 16 (hypertext dual quad core) that they get by default. I am looking for a way to make this process without saving to disk. I believe that the performance limit for individually pulling files using tar -xf $file -O <targetname>would be prohibitive, but that might be what I'm stuck with.
Is there any way to do this?
EDIT: Since two people have already made this mistake, I will clarify:
- Each file represents one point in time.
- Each file is processed separately.
- After processing (in this case, a variant of the Fourier analysis), each of them gives one line of output.
- This conclusion can be combined to do things like autocorrelation in time.
EDIT2: Actual Code:
for f in posns/*; do
~/data_analysis/intermediate_scattering_function < "$f"
done | ~/data_analysis/complex_autocorrelation.awk limit=1000 > inter_autocorr.txt