Why does PLY handle regular expressions differently than Python / re?

Some background:

I am writing a parser to get information from sites with markup language. Standard libraries like wikitools, ... do not work for me, since I need to be more specific, and adapting them to my needs creates complexity between me and the problem. Python + "simple" regular expression made it difficult to define dependencies between different "tokens" in the markup language in a transparent way - so obviously, I had to get to PLY at the end of this journey.

Now it seems that PLY identifies tokens through regex differently compared to Python, but I cannot find anything on it. I don’t want to move on if I don’t understand how PLY defines markers in my lexer (otherwise I couldn’t control the logic I'm in and it won’t work at a later stage).

Here we go:

import ply.lex as lex

text = r'--- 123456 ---'
token1 = r'-- .* --'
tokens = (
   'TEST',
)
t_TEST = token1

lexer = lex.lex(reflags=re.UNICODE, debug=1)
lexer.input(text)
for tok in lexer:
    print tok.type, tok.value, tok.lineno, tok.lexpos

leads to:

lex: tokens   = ('TEST',)
lex: literals = ''
lex: states   = {'INITIAL': 'inclusive'}
lex: Adding rule t_TEST -> '-- .* --' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_TEST>-- .* --)'
TEST --- 123456 --- 1 0

The last line is amazing - I would expect the first and last to -be absent in --- 123456 ---if it is comparable to "search" (and nothing comparable to "match"). Obviously, this is important, since then --it cannot be distinguished from ---(or ===from ===), that is, headers, enumbering, ... cannot be differentiated.

, PLY - Python/regex? ( ? - - , stackoverflow).

, PLY, , .. , . - , , . .

- ?

Python/regex:

import re

text = r'--- 123456 ---'
token1 = r'-- .* --'

p = re.compile(token1)

m = p.search(text)
if m:
    print 'Match found: ', m.group()
else:
    print 'No match'

m = p.match(text)
if m:
    print 'Match found: ', m.group()
else:
    print 'No match'

:

Match found:  -- 123456 --
No match

( , "", "" )

: spyder - :

Python 2.7.5+ (default, Sep 19 2013, 13:49:51) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Imported NumPy 1.7.1, SciPy 0.12.0, Matplotlib 1.2.1
Type "scientific" for more details.

.

+3
1

lexmatch ply , , . lex.py:

c = re.compile("(?P<%s>%s)" % (fname,f.__doc__), re.VERBOSE | self.reflags)

VERBOSE. , re . , r'-- .* --' r'--.*--', '--- foobar ---'. . re.VERBOSE.

+2

All Articles