Perl regex: matching parentheses

I am trying to match nested brackets {}with regular expressions in Perl so that I can extract certain pieces of text from a file. This is what I have:

my @matches = $str =~ /\{(?:\{.*\}|[^\{])*\}|\w+/sg;

foreach (@matches) {
    print "$_\n";
}

At certain times, this works as expected. For example, if $str = "abc {{xyz} abc} {xyz}", I get:

abc
{{xyz} abc}
{xyz}

as was expected. But for other input lines, it does not work as expected. For example, if $str = "{abc} {{xyz}} abc", the output is:

{abc} {{xyz}}
abc

what I did not expect. I would like to {abc}and {{xyz}}were on separate lines, as each of them is balanced on its terms parentheses. Is there a problem with my regex? If so, how can I fix it?

+5
source share
7

, , ? :

my @matches = $str =~ /\{(?:\{.*\}|[^{])*\}|\w+/sg;
                       ^    ^ ^ ^  ^      ^
                       |    | | |  |      |
{ ---------------------+    | | |  |      |
a --------------------------)-)-)--+      |
b --------------------------)-)-)--+      |
c --------------------------)-)-)--+      |
} --------------------------)-)-)--+      |
  --------------------------)-)-)--+      |
{ --------------------------+ | |         |
{ ----------------------------+ |         |
x ----------------------------+ |         |
y ----------------------------+ |         |
z ----------------------------+ |         |
} ------------------------------+         |
} ----------------------------------------+

, , / \{.*\}/ . , -,

(?: \s* (?: \{ ... \} | \w+ ) )*

... -

(?: \s* (?: \{ ... \} | \w+ ) )*

, . - .

say $1
   while /
      \G \s*+ ( (?&WORD) | (?&BRACKETED) )

      (?(DEFINE)
         (?<WORD>      \s* \w+ )
         (?<BRACKETED> \s* \{ (?&TEXT)? \s* \} )
         (?<TEXT>      (?: (?&WORD) | (?&BRACKETED) )+ )
      )
   /xg;

, , Text:: Balanced.

+11

perlfaq5, , , (? PARNO) Regexp::Common.

, , . , Text::Balanced . .

.

use v5.10;
use strict;
use warnings;

use Text::Balanced qw(extract_multiple extract_bracketed);

my @strings = ("abc {{xyz} abc} {xyz}", "{abc} {{xyz}} abc");

for my $string (@strings) {
    say "Extracting from $string";

    # Extract all the fields, rather than one at a time.
    my @fields = extract_multiple(
        $string,
        [
            # Extract {...}
            sub { extract_bracketed($_[0], '{}') },
            # Also extract any other non whitespace
            qr/\S+/
        ],
        # Return all the fields
        undef,
        # Throw out anything which does not match
        1
    );

    say join "\n", @fields;
    print "\n";
}

extract_multiple split.

+10

. :

my @matches;
push @matches, $1 while $str =~ /( [^{}\s]+ | ( \{ (?: [^{}]+ | (?2) )* \} ) )/xg;

, :

my @matches = $str =~ /[^{}\s]+ | \{ (?: (?R) | [^{}]+ )+ \} /gx;
+4

,
, . {1{2{3}}},

/\{[^}]*[^{]*\}|\w+/g

, , . {1{2}{2}{2}},

/(?>\{(?:[^{}]*|(?R))*\})|\w+/g

(?R) .

, , (?:[^{}]*|(?R))*,
[^{}]*, (?R), *.

, , "{abc {def}}", , "{", [^{}]* "abc ", (?R) "{def}", "}".

"{def}" , (?R)
(?>\{(?:[^{}]*|(?R))*\})|\w+, , , "{", [^{}]*, "}".

(?>... ) regex . , .

+2

:

(\{(?:(?1)|[^{}]*+)++\})|[^{}\s]++

( PCRE. Perl, , , ).

( Perl!), ideone. $& , .

my $str = "abc {{xyz} abc} {xyz} {abc} {{xyz}} abc";

while ($str =~ /(\{(?:(?1)|[^{}]*+)++\})|[^{}\s]++/g) {
    print "$&\n"
}

, , . . , , . ( ), : abc{xyz}asd .

+1

Text::Balanced.

script.pl:

#!/usr/bin/env perl

use warnings;
use strict;
use Text::Balanced qw<extract_bracketed>;

while ( <DATA> ) { 

    ## Remove '\n' from input string.
    chomp;

    printf qq|%s\n|, $_; 
    print "=" x 20, "\n";


    ## Extract all characters just before first curly bracket.
    my @str_parts = extract_bracketed( $_, '{}', '[^{}]*' );

    if ( $str_parts[2] ) { 
        printf qq|%s\n|, $str_parts[2];
    }   

    my $str_without_prefix = "@str_parts[0,1]";


    ## Extract data of balanced curly brackets, remove leading and trailing
    ## spaces and print.
    while ( my $match = extract_bracketed( $str_without_prefix, '{}' ) ) { 
        $match =~ s/^\s+//;
        $match =~ s/\s+$//;
        printf qq|%s\n|, $match;

    }   

    print "\n";
}

__DATA__
abc {{xyz} abc} {xyz}
{abc} {{xyz}} abc

:

perl script.pl

:

abc {{xyz} abc} {xyz}
====================
abc 
{{xyz} abc}
{xyz}

{abc} {{xyz}} abc
====================
{abc}
{{xyz}}
+1

. - .

The problem you are facing is that you match in greedy mode. That is, you use the regex engine to match as much as possible, making the expression true.

To avoid a greedy match, just add '?' after your quantifier. This makes the match as short as possible.

So, I changed your expression:

my @matches = $str =~ /\{(?:\{.*\}|[^\{])*\}|\w+/sg;

To:

my @matches = $str =~ /\{(?:\{.*?\}|[^\{])*?\}|\w+/sg;

... and now it works exactly as you expect.

NTN

Francisco

+1
source

All Articles