Match all uppercase words only if there are lowercase letters in the string with one regular expression

Question

Match all uppercase words only if there are lowercase letters in the string with one regular expression

I stumbled upon this seemingly trivial question, and I was stuck on it. I have a line in which I want to match all uppercase words in one regular expression , only if somewhere in the line there is at least a lowercase letter.

Basically, I want each of these lines (we can consider that I will apply the regular expression for each line separately, I don’t need to use some multi-line processing):

ab ABC          //matches or captures ABC
ab ABC 12 CD    //matches or captures ABC, CD
ABC DE          //matches or captures nothing (no lowercase)
ABC 23 DE EFG a //matches or captures ABC, DE, EFG
AB aF DE        //matches or captures AB, DE

I use PCRE as a regex fragrance (I know that some other fragrances allow you to watch with variable lengths).

Update after comments

Obviously, there are many simple solutions if I use several regular expressions or the programming language I use to call a regular expression (for example, first check the string looking for a lowercase letter, and then match all the uppercase words with two different regular expressions) .

My goal here is to find a way to do this with a single regex.

I have no technical imperative for this limitation. Take this as a style exercise if you need, or curiosity, or I try to fulfill my regular expression skills: the task seemed (at first) so simple that I would like to know if only one regular expression can do it. If he cannot, I would like to understand why.

, , , , , - , , " ", , .

, ?

+3

regex perl pcre

Robin 18 . '14 1:22

5

.

/(?{ @matches = m{\b\p{Lu}+\b}g if m{\p{Ll}} })/;

:

use strict;
use warnings;
use feature qw( say );

while (<DATA>) {
   chomp;

   local our @matches;
   /(?{ @matches = m{\p{Lu}+}g if m{\p{Ll}} })/;

   say "$_: ", join ', ', @matches;
}

__DATA__
ab ABC
ab ABC 12 CD
ABC DE
ABC 23 DE EFG a

:

my @matches = /
   \G
   (?: (?! ^ )
   |   (?= .* \p{Ll} )
   )
   .*? ( \b \p{Lu}+ \b )
/sg;

my @matches = /\G(?:(?!^)|(?=.*\p{Ll})).*?(\b\p{Lu}+\b)/sg;

. , .

+1

ikegami 18 . '14 3:17

, , , "?". .

, , . , , ; , . (., , .)

, , , , , , .

- . , , () 3, 3 (.. 3). : - (Type-3) , ( , ). - , , Type-2 ( , !

For regular expressions that are expected to fit very quickly, it is even more important to limit their overall expressiveness. But by writing two or more regular expressions with an added control structure, you effectively extend them to be more powerful than the regular expression parser.

+1

jpaugh Feb 19 '14 at 1:57

source share

Perhaps we are above thoughts:

#! /usr/bin/env perl
#
use strict;
use feature qw(say);
use autodie;
use warnings;
use Data::Dumper;

while ( my $string = <DATA> ) {
    chomp $string;
    my @array;
    say qq(String: "$string");
    if ( @array = $string =~ /(\b[A-Z]+\b)/g ) {
        say qq(String groups: ) . join( ", ", @array ) . "\n";
    }
}

__DATA__
ab ABC
ab ABC 12 CD
ABC DE
ABC 23 DE EFG a
AB aF DE
ADSD asd ADSD
asd ADSDSD
SDSD SDD SD
SSDD SDS asds

Output:

String: "ab ABC"
String groups: ABC

String: "ab ABC 12 CD"
String groups: ABC, CD

String: "ABC DE"
String groups: ABC, DE

String: "ABC 23 DE EFG a"
String groups: ABC, DE, EFG

String: "AB aF DE"
String groups: AB, DE

String: "ADSD asd ADSD"
String groups: ADSD, ADSD

String: "asd ADSDSD"
String groups: ADSDSD

String: "SDSD SDD SD"
String groups: SDSD, SDD, SD

String: "SSDD SDS asds"
String is groups: SSDD, SDS

Did I miss something?

-1

David W. Feb 18 '14 at 23:23

source share

One regex:

@words = split (/[a-z]+/, $_);

-2

Gwp Feb 19 '14 at 2:05

source share

sln · Accepted Answer · 2014-02-18T02:08:36+0000

, \G 0.
, BOS .
BOString BOLine, (?= ^ .* [a-z] ) ,
\G ( ?), UC .

(?|(?=\A.*[a-z]).*?\b([A-Z]+)\b|(?!\A)(?:(?=^.*[a-z])|\G.*?\b([A-Z]+)\b))

2 .
@Robin :

 #  (?:(?=^.*[a-z])|(?!\A)\G).*?\b([A-Z]+)\b

 (?:
      (?= ^ .* [a-z] )        # BOL, check if line has lower case letter
   |                        # or
      (?! \A )                # Not at BOS (beginning of string, where \G is in a matched state)
      \G                      # Start the match at the end of last match (if previous matched state)
 )
 .*? \b 
 ( [A-Z]+ )              # (1), Found UC word
 \b

Perl:

$/ = undef;

$str = <DATA>;

@ary = $str =~ /(?:(?=^.*[a-z])|(?!\A)\G).*?\b([A-Z]+)\b/mg;

print "@ary", "\n-------------\n";

while ($str =~ /(?:(?=^.*[a-z])|(?!\A)\G).*?\b([A-Z]+)\b/mg)
{
   print "$1 ";
}

__DATA__
DA EFR
ab ABC
ab ABC 12 CD
ABC DE  t
ABC 23 DE EFG a

→

ABC ABC CD ABC DE ABC DE EFG
-------------
ABC ABC CD ABC DE ABC DE EFG

Match all uppercase words only if there are lowercase letters in the string with one regular expression

More articles: