Parsing Text Files in Perl

Question

Parsing Text Files in Perl

I am new to perl programming and would like to learn about parsing text files with perl. I have a text file that has incorrect formatting in it, and I would like to parse it into three.

Basically the file contains text similar to the following:

;out;asoljefsaiouerfas'pozsirt'z
mysql_query("SELECT * FROM Table WHERE (value='true') OR (value2='true') OR (value3='true') ");
1234 434 3454

4if[9put[e]9sd=09q]024s-q]3-=04i
select ta.somefield, tc.somefield 
from TableA ta INNER JOIN TableC tc on tc.somefield=ta.somefield 
INNER JOIN TableB tb on tb.somefield=ta.somefield 
ORDER by tb.somefield
234 4536 234

and the list will be continued with this format.

So what I need to do is disassemble it three times. Namely, from above, receiving hash checks. The second is a mysql query, and the third is a parsing of three numbers. For some reason, I don’t understand how to do this. I use the "open" function in perl to get data from a text file. And then I try to use the "split" function for line breaks, but it turns out that the requests are not on the same line or in the template, so I can not use it in the way I understand.

+3

text perl parsing

Tofu May 19, '11 at 21:10

source share

3 answers

:

.
.
- , .

:

use strict;
use warnings;
use English qw<$RS $OS_ERROR>;

local $RS = "\n\n";

open( my $fh, '<', $path_to_file ) 
    or die "Could not open $path_to_file! - $OS_ERROR"
    ;
while ( <> ) { 
    chomp;
    my ( $hash_check_line
       , @inner_lines 
       )
       = split /\n/
       ;
    my @numbers = split /\D+/, pop @inner_lines;
    my $sql     = join( "\n", @inner_lines );

    ...
}

$RS ($/ $INPUT_RECORD_SEPARATOR), , , .

, Perl , , , .

+6

Axeman 19 '11 21:27

, .

, , :

.
.
The "last" line will be a number.
Everything else will be a request.

With that in mind, I present the following code:

open my $fh, '<', $path_to_file
    or die "Can't open $path_to_file: $!";
while (my ($checksum, $query, $numbers) = read_record($fh) ) {
    # do something with record
}
close $fh or warn "$!";

sub read_record {
    my $fh = shift;
    my @lines;
    LINE: while (my $line = <$fh>) {
        chomp $line;
        last LINE if $line eq q{}; # if empty, we're done with the record!
        push @lines, $line;        # store it :)
    }
    return unless @lines;          # if we didn't get anything, eof!
    my $checksum = shift @lines;   # first was checksum.
    my $numbers = pop @lines;      # last thing read was numbers.
    my $query = join ' ', @lines;  # everything else, query.
    return ($checksum, $query, $numbers);
}

Change, of course, the correspondence to the boundary conditions.

+3

Robert P May 19, '11 at 21:24

source share

Andrew Clark · Accepted Answer · 2011-05-19T21:45:52+0000

The following seems to work:

while ($file_content =~ /\s*^(.+?)^(.*?)^(\d+\s+\d+\s+\d+)$/smg) {
    my $checksum = $1;
    my $query = $2;
    my $numbers = $3;
    # do stuff
}

Here is the explanation for the regex:

\s*                   # eat up empty lines
^(.+?)                # save the checksum line to group 1
^(.+?)                # save one or multiple query lines to group 2
^(\d+\s+\d+\s+\d+)$   # save number line to group 3

, , , . , , .

Parsing Text Files in Perl

More articles: