Dies ist die Support Website des Buches:
Das Python Praxisbuch
Der große Profi-Leitfaden für Programmierer
Farid Hajji
Addison Wesley / Pearson Education
ISBN 978-3-8273-2543-3 (Sep 2008), 1298 Seiten.
12. XML und XSLT¶
Eine XML-Datei¶
<?xml version="1.0" encoding="utf-8"?>
<languages>
The interpreted languages are Python, Ruby, Perl and PHP.
<language name="python">
<!-- Our favorite language -->
<name>Python</name>
<inventor>Guido van Rossum</inventor>
<url>www.python.org</url>
</language>
<language name="ruby">
<name>Ruby</name>
<inventor>Yukihiro Matsumoto</inventor>
<url>www.ruby-lang.org</url>
</language>
<language name="perl">
<name>Perl</name>
<inventor>Larry Wall</inventor>
<url>www.perl.org</url>
</language>
<language name="php">
<name>PHP</name>
<inventor>Rasmus Lerdorf</inventor>
<url>www.php.net</url>
</language>
Compiled languages are C and C++
<language name="c">
<name>C</name>
<inventor>Dennis Ritchie</inventor>
<inventor>Brian Kernighan</inventor>
</language>
<language name="c++">
<name>C++</name>
<inventor>Bjarne Stroustrup</inventor>
</language>
Lisp is normally interpreted, but can be compiled as well
<language name="lisp">
<name>Lisp</name>
<inventor>John McCarthy</inventor>
</language>
</languages>
Screenshots:
xml.etree.ElementTree¶
4Suite-XML¶
Screenshots:
4Suite-XML installieren¶
URLs:
- Die 4Suite-XML Website
- Das im Buch benutzte Package (neueres mit
easy_install 4Suite-XML
)
Screenshots:
Die 4Suite-XML-Skripte¶
<a><b></b><b></b></a>
<a><b></b><b></a></a>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b/></a>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b/><b/></a>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b>Non empty b</b></a>
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"
omit-xml-declaration="no"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Programming Languages</title></head>
<body>
<h1>Programming Languages</h1>
<ul>
<xsl:apply-templates select="/languages/language"/>
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="language">
<li xmlns="http://www.w3.org/1999/xhtml">
<b><xsl:copy-of select="name/text()"/></b>
<xsl:text>: </xsl:text>
<xsl:apply-templates select="inventor"/>
</li>
</xsl:template>
<xsl:template match="inventor">
<xsl:copy-of select="text()"/>
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
Screenshots:
Nach 4xslt languages.xml languages.xsl > languages.html
sieht
languages.html
wie folgt aus:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Programming Languages</title>
</head>
<body>
<h1>Programming Languages</h1>
<ul>
<li>
<b>Python</b>: Guido van Rossum </li>
<li>
<b>Ruby</b>: Yukihiro Matsumoto </li>
<li>
<b>Perl</b>: Larry Wall </li>
<li>
<b>PHP</b>: Rasmus Lerdorf </li>
<li>
<b>C</b>: Dennis Ritchie Brian Kernighan </li>
<li>
<b>C++</b>: Bjarne Stroustrup </li>
<li>
<b>Lisp</b>: John McCarthy </li>
</ul>
</body>
</html>
Screenshots:
Ft.Xml.InputSource-Eingabequellen¶
Wie man Ft.Xml.InputSource
benutzt:
from Ft.Xml import InputSource
factory = InputSource.DefaultFactory
isrc1 = factory.fromString("<a><b/><b/></a>",
"https://pythonbook.hajji.org/examples/xml")
isrc2 = factory.fromStream(open("/var/tmp/languages.xml", "rb"),
"https://pythonbook.hajji.org/examples/xml")
isrc3 = factory.fromUri(
"https://pythonbook.hajji.org/examples/xml/languages.xml")
DOM¶
isrc4
sieht so aus:
>>> isrc4 = factory.fromString('''<?xml version="1.0" encoding="utf-8"?>
... <!DOCTYPE a [
... <!ELEMENT a (b, b)>
... <!ELEMENT b EMPTY>
... ]>
... <a><b/><b/></a>''', "https://pythonbook.hajji.org/examples/xml")
Und isrc5
so:
from Ft.Xml import ReaderException
isrc5 = factory.fromString('''<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b>Not empty</b></a>''',
"https://pythonbook.hajji.org/examples/xml")
try:
doc5 = vreader.parse(isrc5)
print "doc5 successfully parsed"
except ReaderException, e:
print e
DOM verstehen¶
Um die Ausgabe als (Byte-)String zu erhalten, kann man StringIO
benutzen:
from cStringIO import StringIO
sio = StringIO()
PrettyPrint(doc1, stream=sio, encoding="utf-8")
buf = sio.getvalue()
sio.close()
URLs:
Elemente mit XPath extrahieren¶
Man kann Elemente aus einer XML-Datei direkt extrahieren, aber es ist subobtimal:
from Ft.Xml import InputSource
from Ft.Xml.Domlette import NonvalidatingReaderBase
factory = InputSource.DefaultFactory
reader = NonvalidatingReaderBase()
isrc2 = factory.fromStream(open("/var/tmp/languages.xml", "rb"),
"https://pythonbook.hajji.org/examples/xml")
doc2 = reader.parse(isrc2)
root = doc2.documentElement
python = root.childNodes[1]
Einfacher geht es mit XPath:
>>> root.xpath(u'//language[@name="python"]')
[<Element at 0x286a18ac: name u'language', 1 attributes, 9 children>]
>>> python = root.xpath(u'//language[@name="python"]')[0]
>>> python
<Element at 0x286a18ac: name u'language', 1 attributes, 9 children>
URLs:
SAX¶
Die Eingabequelle:
from Ft.Xml import InputSource
factory = InputSource.DefaultFactory
isrc = factory.fromUri("file:///var/tmp/languages.xml")
Der SAX-Parser (Saxlette):
from Ft.Xml import Sax
parser = Sax.CreateParser()
Der Content-Handler:
class TagCounter(object):
def startDocument(self):
self.tagCount = {}
def startElementNS(self, name, qname, attribs):
if name in self.tagCount:
self.tagCount[name] += 1
else:
self.tagCount[name] = 1
Und nun parsen wir:
from tagcounter import TagCounter
handler = TagCounter()
parser.setContentHandler(handler)
parser.parse(isrc)
URLs:
Transformationen mit XSLT¶
Um languages.xml
mit languages.xsl
zu transformieren:
from Ft.Xml import InputSource
factory = InputSource.DefaultFactory
ixml = factory.fromUri('file:///var/tmp/languages.xml')
ixsl = factory.fromUri('file:///var/tmp/languages.xsl')
from Ft.Xml.Xslt import Processor
processor = Processor.Processor()
processor.appendStylesheet(ixsl)
result = processor.run(ixml)
result
enthält:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Programming Languages</title>
</head>
<body>
<h1>Programming Languages</h1>
<ul>
<li>
<b>Python</b>: Guido van Rossum </li>
<li>
<b>Ruby</b>: Yukihiro Matsumoto </li>
<li>
<b>Perl</b>: Larry Wall </li>
<li>
<b>PHP</b>: Rasmus Lerdorf </li>
<li>
<b>C</b>: Dennis Ritchie Brian Kernighan </li>
<li>
<b>C++</b>: Bjarne Stroustrup </li>
<li>
<b>Lisp</b>: John McCarthy </li>
</ul>
</body>
</html>