I parse and validate fairly large XML (> 100 MB) into several DTDs, conditionally, based on docinfo:
parser = etree.XMLParser(recover=True)
xmlfile = etree.parse(file,parser)
if "aaa.dtd" in xmlfile.docinfo.doctype.lower():
dtdfile= "dtds/aaa.dtd"
elif "bbb.dtd" in xmlfile.docinfo.doctype.lower():
dtdfile= "dtds/bbb.dtd"
elif "ccc.dtd" in xmlfile.docinfo.doctype.lower():
dtdfile= "dtds/ccc.dtd"
dtd = etree.DTD(dtdfile)
if dtd.validate(xmlfile)==True:
do sth
My problem is memory consumption, so I thought I should use iterparse, but I cannot find a way to do the same check.
Thanks in advance.
source
share