Regular expressions: match a word or maximum number of words

I want to search for a phrase that matches several words, following it, but stop early if I find another specific phrase.

For example, I want to combine up to three words after "going to", but stop the matching process if I encounter an "attempt". So, for example, "visiting the moon park" will lead to the "lunar park"; "going to the capital of Peru" will lead to a "capital" and "go to the moon to try cheesecake" will lead to a "moon".

Can this be done with a simple simple regular expression (preferably in Python)? I tried all the combinations that I could think of, but failed :).

+5
source share
2

3 ({1,3}) , going to the, , ((?!to try)):

import re
infile = open("input", "r")
for line in infile:
    m = re.match("going to the ((?:\w+\s*(?!to try)){1,3})", line)
    if m:
        print m.group(1).rstrip()

luna park
capital city of
moon
+5

, . NLTK . . , , , , ( ).

-2

All Articles