Blocking the output file for a shell script called several times in parallel

I have about a million files over which I want to run a shell script and add the result to a single file.

For example, suppose I just want to run wcin files. To make it work fast, I can parallelize it with xargs. But I do not want scripts to step over each other when writing output. It is probably best to write several separate files, rather than one, and then catthem later. But I still want the number of such temporary output files to be significantly less than the number of input files. Is there a way to get the lock that I want, or is this the case that is always provided by default?

Is there any utility that will recursively be cattwo parallel files?

I can write a script for this, but have to deal with temporary ones and clear it. So it was interesting if there is a utility that does this.

+3
source share
1 answer

GNU parallel claims to be:

ensures that the output of the commands is the same result as you, you run the commands sequentially

If so, then I believe that it should be safe for a simple channel to output your file and allow parallelintermediate data to be processed.

Use a parameter -kto maintain output order.

Update: (non-Perl solution)

prll, C. GNU parallel, .

:

/ .

, .

:

prll STDERR, STDERR .


: .

+4

All Articles