Some background:
I am writing a parser to get information from sites with markup language. Standard libraries like wikitools, ... do not work for me, since I need to be more specific, and adapting them to my needs creates complexity between me and the problem. Python + "simple" regular expression made it difficult to define dependencies between different "tokens" in the markup language in a transparent way - so obviously, I had to get to PLY at the end of this journey.
Now it seems that PLY identifies tokens through regex differently compared to Python, but I cannot find anything on it. I don’t want to move on if I don’t understand how PLY defines markers in my lexer (otherwise I couldn’t control the logic I'm in and it won’t work at a later stage).
Here we go:
import ply.lex as lex
text = r'--- 123456 ---'
token1 = r'-- .* --'
tokens = (
'TEST',
)
t_TEST = token1
lexer = lex.lex(reflags=re.UNICODE, debug=1)
lexer.input(text)
for tok in lexer:
print tok.type, tok.value, tok.lineno, tok.lexpos
leads to:
lex: tokens = ('TEST',)
lex: literals = ''
lex: states = {'INITIAL': 'inclusive'}
lex: Adding rule t_TEST -> '-- .* --' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_TEST>-- .* --)'
TEST --- 123456 --- 1 0
The last line is amazing - I would expect the first and last to -be absent in --- 123456 ---if it is comparable to "search" (and nothing comparable to "match"). Obviously, this is important, since then --it cannot be distinguished from ---(or ===from ===), that is, headers, enumbering, ... cannot be differentiated.
, PLY - Python/regex? ( ? - - , stackoverflow).
, PLY, , .. , . - , , . .
- ?
Python/regex:
import re
text = r'--- 123456 ---'
token1 = r'-- .* --'
p = re.compile(token1)
m = p.search(text)
if m:
print 'Match found: ', m.group()
else:
print 'No match'
m = p.match(text)
if m:
print 'Match found: ', m.group()
else:
print 'No match'
:
Match found: -- 123456 --
No match
( , "", "" )
: spyder - :
Python 2.7.5+ (default, Sep 19 2013, 13:49:51)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Imported NumPy 1.7.1, SciPy 0.12.0, Matplotlib 1.2.1
Type "scientific" for more details.
.