Iteratively write XML nodes in python

There are many ways to read XML, both with simultaneous (DOM) and one bit per second (SAX). I used SAX or lxml to iteratively read large XML files (e.g. wikipedia dump, which is 6.5GB compressed).

However, after doing some iterative processing (in python using ElementTree) of this XML file, I want to write the (new) XML data to another file.

Are there libraries for iteratively writing XML data? I could create an XML tree and then write it, but this is not possible without using templates. Is there anyway to write an XML tree to an iterative file? One bit at a time?

I know that I could generate the XML itself using print "<%s>" % tag_name, etc., but that seems a bit ... hacky.

+3
source share
4 answers

Fredrik Lundh elementtree.SimpleXMLWriter lets you write out XML step by step. Here's the demo code built into the module:

from elementtree.SimpleXMLWriter import XMLWriter
import sys

w = XMLWriter(sys.stdout)

html = w.start("html")

w.start("head")
w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")
w.end()

w.start("body")
w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")

w.start("p")
w.data("this is ")
w.element("b", "bold")
w.data(" and ")
w.element("i", "italic")
w.data(".")
w.end("p")

w.close(html)
+4
source

If you are reading in the XML1 dialect and are supposed to write the XML dialect2, would it not be a good idea to record the conversion process using xslt ? You may not even need any source code.

+1
source

, , , ElementTree "iteractiveElementTree", "file". "start_tag_comitted" "commit". "commit" render - , e "start_tag_comitted" . ​​ node. , .

"Commited" node . node node, ElementTree .

(email me if there are no better answers that you are stuck there, I could implement this)

+1
source

In lxml, you can use etree.Elementto create new nodes and etree.tostringto write an XML representation. See, for example, Listing 6. Serializing child elements of an element from Lisa Daily's article, "High-Performance XML Parsing in Python with lxml."

0
source

All Articles