Can I make this (Python) regex faster?

I am trying to write a regular expression that matches a line of the form ##-##(where # matches any digit), with the caveat that the second pair of digits cannot be "00". The expression must be used by re.search and must record the first match of the matching pattern.

Here is what I have (what works):

the_regex = re.compile("(\d\d-(?:0[123456789]|[123456789]\d))")

I am not alone with a branch or a long group of characters. Can someone suggest a better (more understandable or not necessarily more efficient) regular expression?

(Yes, this is micro-optimization, and I listened to the corresponding warnings from Knut.)

+3
source share
3 answers

... , , ... lookahead:

r2 = re.compile(r"(\d\d-(?!00)\d\d)")
l = re.findall(r2, 'On 02-14 I went looking for 12-00 and 14-245')
print l
['02-14', '14-24']

... ( ). , :

# Martijn/Aaron solution
In [20]: %timeit l = re.findall(the_regex2, '11-01 11-99 10-29 01-99 00-00 11-00')
100000 loops, best of 3: 3.55 µs per loop

# Above version
In [21]: %timeit l = re.findall(r2, '11-01 11-99 10-29 01-99 00-00 11-00')
100000 loops, best of 3: 3.49 µs per loop

#Original post version.
In [25]: the_regex = re.compile("(\d\d-(?:0[123456789]|[123456789]\d))")
In [26]: %timeit l = re.findall(the_regex, '11-01 11-99 10-29 01-99 00-00 11-00')    
100000 loops, best of 3: 3.41 µs per loop
+1

:

r"(\d\d-(?:0[1-9]|[1-9]\d))"

.

+4
the_regex = re.compile("(\d\d-(?:0[1-9]|[1-9]\d))")



l = re.findall(the_regex, '11-01 11-99 10-29 01-99 00-00 11-00')
print l

shows:

['11-01', '11-99', '10-29', '01-99']

if you use re.finditer, it returns a generator that might be better for you:

it = re.finditer(the_regex, '11-01 11-99 10-29 01-99 00-00 11-00')
print type(it)
print list(i.group(0) for i in it)

shows it:

<type 'callable-iterator'>
['11-01', '11-99', '10-29', '01-99']
+3
source

All Articles