Regex ignore certain characters

I am analyzing text on non-alphanumeric characters and would like to exclude certain characters such as apostrophes, dashes / hyphens and commas.

I would like to create a regex for the following cases:

  • non-alphanumeric character excluding apostrophes and hypens
  • non-alphanumeric character excluding commas, apostrophes, and hypens

Here is what I tried:

def split_text(text):
    my_text = re.split('\W',text)

    # the following doesn't work.
    #my_text = re.split('([A-Z]\w*)',text)
    #my_text = re.split("^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$",text)

    return my_text
  • Case 1 :
    • Input Example: What? Nice to see you, my friend. Hello to the world!
    • Result: ['What', 'Up', 'It', 'good', 'to', 'see', 'you', 'my-friend', 'Hello', 'to-the', 'world' ]
  • Case 2 :
    • : , .
    • : ['It', '', 'that', 'it', 'not', 'good-to', 'do', 'such', ' ]

+3
2

, ?

- ,

my_text = re.split(r"[^\w'-]+",text)

-- , , hypens

my_text = re.split(r"[^\w-',]+",text)

[] , [^..] "" , .. .

:

, , , . '^', , , . , [^5] , '5', [^^] , '^'. ^ , .

+3

:

my_text = re.split(r"[^\w'-]+",text)

my_text = re.split(r"[^\w,'-]+",text)   # also excludes commas
+3

All Articles