Split file into unequal chunks on Linux

I want to split a large file (with ~ 17M lines of lines) into several files with a different number of lines in each fragment. Is it possible to send an array to the 'split -l' command as follows:

[
 1=>1000000,
 2=>1000537,
 ...
]

to send so many lines to each fragment

+5
source share
3 answers

Use the compound command:

{
  head -n 10000 > output1
  head -n   200 > output2
  head -n  1234 > output3
  cat > remainder
} < yourbigfile

This also works with loops:

{
  i=1
  for n in 10000 200 1234
  do
      head -n $n > output$i
      let i++
  done
  cat > remainder
} < yourbigfile

This does not work on OS X, where it headreads and discards additional output.

+6
source

The team splitdoes not have such an opportunity, so you have to use another tool, or write your own.

+1
source

sed, script sed .

# split_gen.py
use strict;
my @limits = ( 100, 250, 340,999);
my $filename = "joker";

my $start = 1;
foreach my $end (@limits) {
    print qq{sed -n '$start,${end}p;${end}q' $filename > $filename.$start-$end\n};
    $start = $end + 1;
}

, , perl split_gen.py, :

sed -n '1,100p;100q' joker > joker.1-100
sed -n '101,250p;250q' joker > joker.101-250
sed -n '251,340p;340q' joker > joker.251-340
sed -n '341,999p;999q' joker > joker.341-999

,

perl split_gen.py | sh 

, .

+1
source

All Articles