How to build a regex to parse values ​​separated by commas, but ignore the comma in double quotes?

Example line:

2011-03-09,4919 1281 0410 9930,55107,SAZB2314,"John, Doe" ,1-888-888-4452 ext 1813

All commas should be noted, but not the ones indicated in double quotation marks.

+3
source share
7 answers

You can use Text::CSVfrom CPAN.

+17
source

Or use Text :: CSV_XS , which does the same thing, but faster.

+10
source
+1

, , , @eugene y, . .

(?:(?:([^"]*?|".*?"),)*([^"]*?|".*?"))?
0

Try:

use strict;
use warnings;
use Text::ParseWords;

while (<DATA>) {
    chomp;
    my @f = quotewords ',', 0, $_;
    for (@f) {
            s/^\s*|\s*$//g;
            s/^/"/ && s/$/"/ if /,/;
    }
    print join (",", @f), "\n";
}

__DATA__
2011-03-09,4919 1281 0410 9930,55107,SAZB2314,"John, Doe" ,1-888-888-4452 ext 1813
"ashish", "kumar", "test,1", "test2"
"foo", "b,ar", "msg1", "msg2"
0

, csv .

("([^"]*)",?)|(([^",]*),?)

, . , .

0

, Java. PERL, . .

// 1) select any quoted text before comma
// if it fails then
// 2) select any text before comma
// if it also fails then
// 3) select any text before end of the input

final String OR           = "|";
final String QUOTE        = "\"[\\s]*"; //with trailing whitespaces
final String NON_QUOTES   = "[^\"]*";
final String COMMA        = ",";
final String NON_COMMA    = "[^,]*"; 
final String NON_END      = "[^$]+"; 
final String END          = "$";

final Pattern p = Pattern.compile(
QUOTE+NON_QUOTES+QUOTE+COMMA+
OR+
NON_COMMA+COMMA+
OR+
NON_END+END);

, , , , . , union, .

-1

All Articles