Dumping objects with xml.etree.ElementTree

We can use the xml.etree.ElementTree module to build Element structures that can be emitted as XML. It's challenging to use xml.dom and xml.minidom for this. The DOM API requires a top-level document that then builds individual elements. The presence of this necessary context object creates clutter when trying to serialize a simple class with several attributes. We have to create the document first and then serialize all the elements of the document, providing the document context as an argument.

Generally, we'd like each class in our design to build a top-level element and return that. Most top-level elements will have a sequence of subelements. We can assign text as well as attributes to each element that we build. We can also assign a tail, which is the extraneous text that follows a closed tag. In some content models, this is just whitespace. Because of the long name, it might be helpful to import ElementTree in the following manner:

import xml.etree.ElementTree as XML 

Here are two extensions to our microblog class structure that add the XML output capability as the Element instances. We can use the following extension to the Blog_X class:

import xml.etree.ElementTree as XML
from typing import cast

class Blog_E(Blog_X):

def xmlelt(self) -> XML.Element:
blog = XML.Element("blog")
title = XML.SubElement(blog, "title")
title.text = self.title
title.tail = " "
entities = XML.SubElement(blog, "entries")
entities.extend(cast('Post_E', c).xmlelt() for c in self.entries)
blog.tail = " "
return blog

We can use the following extension to the Post_X class:

class Post_E(Post_X):

def xmlelt(self) -> XML.Element:
post = XML.Element("entry")
title = XML.SubElement(post, "title")
title.text = self.title
date = XML.SubElement(post, "date")
date.text = str(self.date)
tags = XML.SubElement(post, "tags")
for t in self.tags:
tag = XML.SubElement(tags, "tag")
tag.text = t
text = XML.SubElement(post, "rst_text")
text.text = self.rst_text
post.tail = " "
return post

We've written highly class-specific XML output methods. These will build the Element objects that have the proper text values.

There's no fluent shortcut for building the subelements. We have to insert each text item individually.

In the Blog.xmlelt() method, we were able to perform Element.extend() to put all of the individual post entries inside the <entry> element. This allows us to build the XML structure flexibly and simply. This approach can deal gracefully with the XML namespaces. We can use the QName class to build qualified names for XML namespaces. The ElementTree module correctly applies the namespace qualifiers to the XML tags. This approach also properly escapes the <, &, >, and " characters into the &lt;, &gt;, &amp;, and &quot; XML entities. The XML output from these methods will mostly match the previous section. The whitespace will be different.

To build the final output, we use two additional features of the element tree module. It will look like this snippet:

tree = XML.ElementTree(travel5.xmlelt())
text = XML.tostring(tree.getroot())

The travel5 object is an instance of Blog_E. The result of evaluating travel5.xmlelt() is an XML.Element; this is wrapped into a complete XML.ElementTree object. The tree's root object can be transformed into a valid XML string and printed or saved to a file.

Let's see how to load XML documents.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset