Java regex: how to reuse a consuming character in pattern matching?

Is there a way to reuse a consuming source character when matching with a pattern?

For example, suppose I want to find a pattern with a regular expression (a+b+|b+a+) that is more than one a, followed by more than one b OR vice versa.

Suppose the input aaaabbbaaaaab

Then the output using the regular expression will be aaaabbbandaaaaab

How can I get the conclusion

aaaabbb
bbbaaaaa
aaaaab
+5
source share
2 answers

Try this way

String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
    System.out.println(m.group(1));

This regex uses look around mechanisms and finds (a+b+|b+a+)that

  • exists at ^input startup
  • b, a
  • a, b.

:

aaaabbb
bbbaaaaa
aaaaab

^ ?

, ^ aaaabbb, .

(^|(?<=a)b|(?<=b)a) (?=(a+b+|b+a+)),

aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab

, a, b ( b ), ) b, a.

a b, . , ^.


,

(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a).

  • (?<=^|a)b b, a
  • (?<=^|b)a a, , b
+6

lookbehind:

((?<=a)b+|(?<=b)a+)

bbb aaaaa b
+3

All Articles