I am trying to remove all special characters from some text, here is my regex:
pattern = re.compile('[\W_]+', re.UNICODE)
words = str(pattern.sub(' ', words))
Super simple, but unfortunately it causes problems when using apostrophes (single quotes). For example, if I had the word "no", this code returns "doesn".
Is there a way to adapt this regular expression so that it does not remove apostrophes in such cases?
edit: this is what i do after:
doesn't this mean it -technically- works?
it should be:
doesn't that mean it technically works
source
share