, ,
. , .
<html> XML ,
. ,
:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd"
xml:lang="en">
<head>
<title>Document Title</title>
</head>
<body>
</body>
</html>
, :
>>> doc = etree.parse(open('foo.html'))
>>> doc.xpath('//title')
[]
, <title>
... ,
( foo:title
bar:title, , foo: bar: XML
).
ElementTree,
:
>>> doc.xpath('//html:title',
... namespaces={'html': 'http://www.w3.org/1999/xhtml'})
[<Element {http://www.w3.org/1999/xhtml}title at 0x1087910>]
.
tag iterparse:
>>> titleIter = etree.iterparse(StringIO(str),
... tag='{http://www.w3.org/1999/xhtml}title')
>>> list(titleIter)
[(u'end', <Element {http://www.w3.org/1999/xhtml}title at 0x7fddb7c4b8c0>)]
, , .