Regex ignore certain characters

Question

Regex ignore certain characters

I am analyzing text on non-alphanumeric characters and would like to exclude certain characters such as apostrophes, dashes / hyphens and commas.

I would like to create a regex for the following cases:

non-alphanumeric character excluding apostrophes and hypens
non-alphanumeric character excluding commas, apostrophes, and hypens

Here is what I tried:

def split_text(text):
    my_text = re.split('\W',text)

    # the following doesn't work.
    #my_text = re.split('([A-Z]\w*)',text)
    #my_text = re.split("^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$",text)

    return my_text

Case 1 :
- Input Example: What? Nice to see you, my friend. Hello to the world!
- Result: ['What', 'Up', 'It', 'good', 'to', 'see', 'you', 'my-friend', 'Hello', 'to-the', 'world' ]
Case 2 :
- : , .
- : ['It', '', 'that', 'it', 'not', 'good-to', 'do', 'such', ' ]

+3

python regex parsing

user3247054 07 . '14 10:31

2

:

my_text = re.split(r"[^\w'-]+",text)

my_text = re.split(r"[^\w,'-]+",text)   # also excludes commas

+3

Tim Pietzcker 07 . '14 10:36

zmo · Accepted Answer · 2014-02-07T10:36:56+0000

, ?

- ,

my_text = re.split(r"[^\w'-]+",text)

-- , , hypens

my_text = re.split(r"[^\w-',]+",text)

[] , [^..] "" , .. .

:

, , , . '^', , , . , [^5] , '5', [^^] , '^'. ^ , .

Regex ignore certain characters

More articles: