Python splits string using regex

I would like to split the string into characters: and. However, I would like to ignore the two spaces '' and the two colons '::'. eg,

text = "s:11011 i:11010 ::110011  :110010 d:11000"

should be divided into

[s,11011,i,11010,:,110011, ,110010,d,11000]

after executing the HOWTO regular expressions on the python website, I managed to find the following

regx= re.compile('([\s:]|[^\s\s]|[^::])')
regx.split(text)

However, this does not work as intended, as it breaks into: and spaces, but it still has a ":" and "".

[s,:,11011, ,i,:,11010, ,:,:,110011, , :,110010, ,d,:,11000]

How can i fix this?

EDIT: In case of double space, I only need one place to display

+5
source share
4 answers

Note. It is assumed that your data has a format, for example X:101010:

>>> re.findall(r'(.+?):(.+?)\b ?',text)
[('s', '11011'), ('i', '11010'), (':', '110011'), (' ', '110010'), ('d', '11000')]

Then chainthem up:

>>> list(itertools.chain(*_))
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
+5
source
>>> text = "s:11011 i:11010 ::110011  :110010 d:11000"
>>> [x for x in re.split(r":(:)?|\s(\s)?", text) if x]
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
+3

regex (?<=\d) |:(?=\d) :

>>> text = "s:11011 i:11010 ::110011  :110010 d:11000"
>>> result = re.split(r"(?<=\d) |:(?=\d)", text)
>>> result
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']

:

(?<=\d) , . , lookbehind.

:(?=\d)colon when the right is a digit. To test this, I use the lookahead statement .

+2
source

Take a look at this template:

([a-z\:\s])\:(\d+)

It will give you the same array that you expect. There is no need to use split, just access the matches that you returned using the regex engine.

Hope this helps!

0
source

All Articles