Play Framework 2.0 BodyParser - push parsing XML streams

I feel pretty deeper asking this question, because despite reading the white papers and resources related to these questions:

How to understand `Iteratee` in play2?

Unable to understand Iteratee, Enumerator, Enumeratee in Play 2.0

... I'm still pretty hazy about iterations, enumerators, and the Play 2.0 jet model in general. But in any case, I would like to set up a web service that allows you to download large XML files (> 100 MB), select specific (unmoved) NodeSeq, process them and send the results back to the client.

I believe the first thing I need to do is write a BodyParser that takes pieces of bytes, passes them to the XML parser and emits a stream of the nodes I need, say <doc>...</doc>, in a lazy way.

Can anyone suggest any directions and / or examples illustrating how this can be accomplished?

Update: more background: -

My XML is actually a Solr document add, so it looks like this:

<add>
    <doc>
        <field name="name">Some Entity</field>
        <field name="details">Blah blah...</field>
        ...
    </doc>
    ...
</add>

I want to process each one in a <doc>stream manner, so my parser obviously had to wait until it hits the start event <doc>, buffer everything until the end of the equivalent event, </doc>and emit NodeSeq from the completed element, and then flush its buffer.

Play BodyParser, . , , !

, XML , <doc /> , , , , .

+5
3

, , org.w3c.Document Java scala.xml scala: xml-

, . 100 xml , 700 .

, ( ) xml Iteratee. Scales Xml ( pull Enumerator) - . .

InputStream ( Reader) - Scales. , Play , ( ) .

NB: , (0.5) aalto-xml, ( ) .

+3

Nux XOM XML . , .

+1

All Articles