XMLSchema: Is it possible to calculate how valid an invalid document is (for example, in percent)?

I use lxmlPython to validate a number of XML documents according to the definition of an XML schema. A good amount of these documents is not confirmed - and they are not expected at the moment - but it would be useful if I could calculate how much they are valid in percent for reporting purposes. I have the opportunity to use xmllintother command line tools if they can provide useful statistics.

+5
source share
1 answer
Analyzers

lxmlprovide a way to get a list of errors that occurred while trying to parse a document. Combine this with the parser recoverkeyword argument , and you get something like this:

# Warning, untested, may not work
parser = etree.XMLParser(recover=True)
it_would_be_a_tree = etree.parse(your_xml_data, parser)
total_errors = len(parser.error_log)

Then you can calculate the percentage of the file that represents total_errors. You can use a naive measure, for example, errors per line or errors per character without any problems. More complex measures are also possible if it_would_be_a_treeit is actually a structure tree( total_elements / total_errorsfor example).

+1
source

All Articles