How to remove new line characters until each line has a certain number of instances of a certain character?

I have a real channel delimited file clutter that I need to load into the database. The file has 35 fields and therefore 34 pipes. One of the fields consists of HTML code, which for some records includes several line breaks. Unfortunately, there is no place where the line falls.

The solution I came up with is to count the number of pipes in each line and until that number reaches 34, remove the new line symbol from this line. I am not incredibly good at Perl, but I think I'm close to achieving what I'm looking for. Any suggestions?

#!/usr/local/bin/perl

use strict;

open (FILE, 'test.txt');

while (<FILE>) {
    chomp;
    my $line = $_;
    #remove null characters that are included in file
    $line =~ tr/\x00//;
    #count number of pipes
    my $count = ($line =~ tr/|//);
    #each line should have 34 pipes
    while ($count < 34) {
        #remove new lines until line has 34 pipes
        $line =~ tr/\r\n//;
        $count = ($line =~ tr/|//);
        print "$line\n";
    }
}
+1
source share
2

, .

#!/usr/bin/perl

use strict;

open (FILE, 'test.txt');

my $num_pipes = 0, my $line_num = 0;
my $tmp = "";
while (<FILE>)
{
    $line_num++;
    chomp;
    my $line = $_;
    $line =~ tr/\x00//; #remove null characters that are included in file
    $num_pipes += ($line =~ tr/|//); #count number of pipes
    if ($num_pipes == 34 && length($tmp))
    {
            $tmp .= $line;
            print "$tmp\n";
            # Reset values.
            $tmp = "";
            $num_pipes = 0;
    }
    elsif ($num_pipes == 34 && length($tmp) == 0)
    {
            print "$line\n";
            $num_pipes = 0;
    }
    elsif ($num_pipes < 34)
    {
            $tmp .= $line;
    }
    elsif ($num_pipes > 34)
    {
            print STDERR "Error before line $line_num. Too many pipes ($num_pipes)\n";
            $num_pipes = 0;
            $tmp = "";
    }
}
+1

Twiddle $/, ?

while (!eof(FILE)) {

    # assemble a row of data: 35 pipe separated fields, possibly over many lines
    my @fields = ();
    {
        # read 34 fields from FILE:
        local $/ = '|';
        for (1..34) {
            push @fields, scalar <FILE>;
        }
    }   # $/ is set back to original value ("\n") at the end of this block

    push @fields, scalar <FILE>;  # read last field, which ends with newline
    my $line = join '|', @fields;
    ... now you can process $line, and you already have the @fields ......
}
+1

All Articles