Can I make this (Python) regex faster?

Question

Can I make this (Python) regex faster?

I am trying to write a regular expression that matches a line of the form ##-##(where # matches any digit), with the caveat that the second pair of digits cannot be "00". The expression must be used by re.search and must record the first match of the matching pattern.

Here is what I have (what works):

the_regex = re.compile("(\d\d-(?:0[123456789]|[123456789]\d))")

I am not alone with a branch or a long group of characters. Can someone suggest a better (more understandable or not necessarily more efficient) regular expression?

(Yes, this is micro-optimization, and I listened to the corresponding warnings from Knut.)

+3

python regex

dcrosta Feb 13 '14 at 15:07

source share

3 answers

:

r"(\d\d-(?:0[1-9]|[1-9]\d))"

.

+4

Martijn Pieters 13 . '14 15:09

the_regex = re.compile("(\d\d-(?:0[1-9]|[1-9]\d))")



l = re.findall(the_regex, '11-01 11-99 10-29 01-99 00-00 11-00')
print l

shows:

['11-01', '11-99', '10-29', '01-99']

if you use re.finditer, it returns a generator that might be better for you:

it = re.finditer(the_regex, '11-01 11-99 10-29 01-99 00-00 11-00')
print type(it)
print list(i.group(0) for i in it)

shows it:

<type 'callable-iterator'>
['11-01', '11-99', '10-29', '01-99']

+3

Aaron hall Feb 13 '14 at 15:12

source share

Corley Brigman · Accepted Answer · 2014-02-13T15:30:37+0000

... , , ... lookahead:

r2 = re.compile(r"(\d\d-(?!00)\d\d)")
l = re.findall(r2, 'On 02-14 I went looking for 12-00 and 14-245')
print l
['02-14', '14-24']

... ( ). , :

# Martijn/Aaron solution
In [20]: %timeit l = re.findall(the_regex2, '11-01 11-99 10-29 01-99 00-00 11-00')
100000 loops, best of 3: 3.55 µs per loop

# Above version
In [21]: %timeit l = re.findall(r2, '11-01 11-99 10-29 01-99 00-00 11-00')
100000 loops, best of 3: 3.49 µs per loop

#Original post version.
In [25]: the_regex = re.compile("(\d\d-(?:0[123456789]|[123456789]\d))")
In [26]: %timeit l = re.findall(the_regex, '11-01 11-99 10-29 01-99 00-00 11-00')    
100000 loops, best of 3: 3.41 µs per loop

Can I make this (Python) regex faster?

More articles: