Removing duplicate words using regex in Python

I need to remove duplicate words in a string in order to 'the (the)'become 'the'. Why can't I do it as follows?

re.sub('(.+) \(\1\)', '\1', 'the (the)')

Thank.

+3
source share
2 answers

You need to double avoid the backlink:

re.sub('(.+) \(\\1\)', '\\1', 'the (the)')
--> the

Or use the rprefix :

When the prefix "r" or "R" is present, the character following the backslash is included in the line without changes, and all backslashes remain in the line.

re.sub(r'(.+) \(\1\)', r'\1', 'the (the)')
--> the
+5
source

According to the documentation : "Optional string notation (r" text ") preserves regular expressions in common sense. '

+1
source

All Articles