The lxml parser parameter does not start the "start" callback immediately when the opening tag is loaded

I tried using the lxml parser target interface to gradually parse the XML into a "user" tree, and I ran into the following problem: if you instantiate the parser and immediately load it into the opening tag of the root element, the callback callback of the target does not triggered until some other event occurs (for example, incoming data, closing tag, other input tag, etc.). This is not like any other (nested) elements.

Demonstration:

class EchoTarget(object):
    def start(self, tag, attrib):
        print("start %s %s" % (tag, attrib))
    def end(self, tag):
        print("end %s" % tag)
    def data(self, data):
        print("data %r" % data)
    def comment(self, text):
        print("comment %s" % text)
    def close(self):
        print("close")
        return "closed!"

>>> p = etree.XMLParser(target=EchoTarget())
>>> p.feed('<a>') # nothing happens
>>> p.feed(' ') # suddenly..
start a {}
>>> p.feed('<b>') # works as expected
data u' '
start b {}

There is a way around this:

>>> p = etree.XMLParser(target=EchoTarget())
>>> p.feed(' ')
>>> p.feed('<a>')
start a {}

? ""? , , "start"?

, :

>>> p = etree.XMLParser(target=EchoTarget())
>>> p.feed('<a')
>>> p.feed('>')
start a {}

2- , -, .

+3
1

, ( http://lxml.de/parsing.html#the-feed-parser-interface):

" close(), , . , , ."

, "", . , , , XML (), close:

>>> p.feed('<a>')
>>> p.close()
start a {}
close
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "parser.pxi", line 1171, in lxml.etree._FeedParser.close (src/lxml/lxml.etree.c:79791)
  File "parsertarget.pxi", line 128, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.
c:88895)
  File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74696)
XMLSyntaxError: Extra content at the end of the document, line 1, column 4

, , ( XML) :

>>> p = etree.XMLParser(target=EchoTarget())
>>> p.feed('<a>')
>>> p.feed('</a>')
start a {}
end a

, .

+1

All Articles