Combine 2 lines into one

I have a text file starting with a 9-digit college code and ending with a 5-digit course code.

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017

There are some entries where there is a line break, as shown in the example above. I need to combine the 3rd and 4th lines into one, like the 1st and 2nd lines, so that I can easily use the grep, awk command, etc.

Update:

Kevin's answer does not seem to work.

cat todel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531

cat todel.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }' 
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531ege of,
+5
source share
8 answers

Regarding split lines: This sedscript assumes that you have at least one space after the leading number (in the first line of the split) and one space before the final number (in the last line of split) and that there is only one breakdown per shared line.

, CRLF Windows * nix LF. , - * nix \n

sed -nr 's/\r?$// # allow for '\r\n' newlines
         /^([0-9]{9}) .* ([0-9]{5})$/{p;b}
         /^([0-9]{9}) /{h;b}
         / ([0-9]{5})$/{x;G; s/\n//; p}' 

, , , , :

sed -nr 's/\r?$//; /^([0-9]{9}) /{/ ([0-9]{5})$/{p;b};h;b};/ ([0-9]{5})$/{x;G; s/\n//; p}' 

, , ( ) , ( ) script .

, ; GNU sed 4.2.1

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,Pune 61220 enter code hereMechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
+1

, "file.txt", , :

cat file.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }'

, 9- . "chomp" , , .

+1

:

sed ':a;$!N;/ [0-9]\{5\}\n[0-9]\{9\} /!s/\n//;ta;P;D' file

:

  • If the line does not end with a space, followed by five digits, followed by nine digits, and then a space, delete the new line.

EDIT:

Test data:

cat <<\! >/tmp/codel.txt
> 112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,
> Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
> !
sed ':a;$!N;/\s[0-9]\{5\}\n[0-9]\{9\}\s/!s/\n//;ta;P;D' /tmp/codel.txt 
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
sed ':a;$!N;/\s[0-9]\{5\}\n[0-9]\{9\}\s/!s/\n//;ta;P;D' /tmp/{codel.txt,codel.txt,codel.txt} 
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
+1
source

Perhaps try to remove all line breaks that appear after commas, for example:

perl -i -pe 's/,\n/,/g' file.txt

maybe you want to allow spaces after commas:

perl -i -pe 's/(,\s*)\n/$1/g' file.txt
0
source

try it

sed '/^[0-9]\{9\}/{h;};/^[0-9]\{9\}/!{x;G;s/\n//g;}' test | grep -E '[0-9]{5}$'
0
source
awk '! ($1 ~ /^[[:digit:]]/) {$0 = save " " $0} $1 ~ /^[[:digit:]]/ {save = $0} $NF ~ /[[:digit:]]$/ {print}' inputfile
0
source
cat todel.txt |awk 'BEGIN {i=0} {first[i]=$1; lines[i++] = $0;} END {for (x=0; x<i; x++) { if ( x==(i - 1) || (first[x + 1] ~ /^[0-9]+$/ && length(first[x + 1])==9) ) {printf("%s: %s\n", x, lines[x]);} else {printf("%s: %s%s\n", x, lines[x], lines[x + 1]); x++;} } }'
0
source

This works with an included dataset, assuming that valid entries end with five digits:

use Modern::Perl;

my $data = do{local $/; <DATA>};
$data =~ s/([^\d]{5})\n/$1 /sg;
say $data;


__DATA__
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531

Conclusion:

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering, Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha Jagdambha College of, Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 935315
0
source

All Articles