I am analyzing text on non-alphanumeric characters and would like to exclude certain characters such as apostrophes, dashes / hyphens and commas.
I would like to create a regex for the following cases:
- non-alphanumeric character excluding apostrophes and hypens
- non-alphanumeric character excluding commas, apostrophes, and hypens
Here is what I tried:
def split_text(text):
my_text = re.split('\W',text)
return my_text
- Case 1 :
- Input Example: What? Nice to see you, my friend. Hello to the world!
- Result: ['What', 'Up', 'It', 'good', 'to', 'see', 'you', 'my-friend', 'Hello', 'to-the', 'world' ]
- Case 2 :
- : , .
- : ['It', '', 'that', 'it', 'not', 'good-to', 'do', 'such', ' ]