Dumping objects using string templates

One way to serialize a Python object into XML is by including a method to emit XML text. In the case of a complex object, the container must get the XML for each item inside the container. Here are two simple extensions to our microblog class structure that add the XML output capability as text:

from dataclasses import dataclass, field, asdict

@dataclass
class Blog_X:
title: str
entries: List[Post_X] = field(default_factory=list)
underline: str = field(init=False)

def __post_init__(self) -> None:
self.underline = "="*len(self.title)

def append(self, post: 'Post_X') -> None:
self.entries.append(post)

def by_tag(self) -> DefaultDict[str, List[Dict[str, Any]]]:
tag_index: DefaultDict[str, List[Dict[str, Any]]] = defaultdict(list)
for post in self.entries:
for tag in post.tags:
tag_index[tag].append(asdict(post))
return tag_index

def as_dict(self) -> Dict[str, Any]:
return asdict(self)

def xml(self) -> str:
children = " ".join(c.xml() for c in self.entries)
return f"""
<blog><title>{self.title}</title>
<entries>
{children}
<entries>
</blog>
"""

We've included several things in this dataclass-based definition. First, we defined the core attributes of a Blog_X object, a title and a list of entries. In order to make the entries optional, a field definition was provided to use the list() function as a factory for default values. In order to be compatible with the Blog class shown earlier, we've also provided an underline attribute that is built by the __post_init__() special method.

The append() method is provided at the Blog_X class level to be compatible with the Blog class from the xxx section. It delegates the work to the entries attribute. The by_tag() method can be used to build an index by hashtags. 

The as_dict() method was defined for Blog objects, and emitted a dictionary built from the object. When working with dataclasses, the dataclasses.asdict() function does this for us. To be compatible with the Blog class, we wrapped the asdict() function into a method of the Blog_X dataclass. 

The xml() method emits XML-based text for this object. It uses relatively unsophisticated f-string processing to inject values into strings. To assemble a complete entry, the entries collection is transformed into a series of lines, assigned to the children variable, and formatted into the resulting XML text.

The Post_X class definition is similar. It's shown here:

@dataclass
class Post_X:
date: datetime.datetime
title: str
rst_text: str
tags: List[str]
underline: str = field(init=False)
tag_text: str = field(init=False)

def __post_init__(self) -> None:
self.underline = "-"*len(self.title)
self.tag_text = ' '.join(self.tags)

def as_dict(self) -> Dict[str, Any]:
return asdict(self)

def xml(self) -> str:
tags = "".join(f"<tag>{t}</tag>" for t in self.tags)
return f"""
<entry>
<title>{self.title}</title>
<date>{self.date}</date>
<tags>{tags}</tags>
<text>{self.rst_text}</text>
</entry>"""

This, too, has two fields that are created by the __post_init__() special method. It includes an as_dict() method to remain compatible with the Post class shown earlier. This method uses the asdict() function to do the real work of creating a dictionary from the dataclass object.

Both classes include highly class-specific XML output methods. These will emit the relevant attributes wrapped in XML syntax. This approach doesn't generalize well. The Blog_X.xml() method emits a <blog> tag with a title and entries. The Post_X.xml() method emits a <post> tag with the various attributes. In both of these methods, subsidiary objects were created using "".join() or " ".join() to build a longer string from shorter string elements. When we convert a Blog object into XML, the results look like this:

<blog><title>Travel</title> 
<entries> 
<entry> 
    <title>Hard Aground</title> 
    <date>2013-11-14 17:25:00</date> 
    <tags><tag>#RedRanger</tag><tag>#Whitby42</tag><tag>#ICW</tag></tags> 
    <text>Some embarrassing revelation. Including ☹ and ⚓</text> 
</entry> 
<entry> 
    <title>Anchor Follies</title> 
    <date>2013-11-18 15:30:00</date> 
    <tags><tag>#RedRanger</tag><tag>#Whitby42</tag><tag>#Mistakes</tag></tags> 
    <text>Some witty epigram.</text> 
</entry> 
<entries></blog> 

This approach has two disadvantages:

  • We've ignored the XML namespaces. That's a small change to the literal text for emitting the tags.
  • Each class would also need to properly escape the <, &, >, and " characters into the &lt;, &gt;, &amp;, and &quot; XML entities. The html module includes the html.escape() function which does this.

This emits proper XML; it can be relied upon to work; but it isn't very elegant and doesn't generalize well.

In the next section, we'll see how to dump objects with xml.etree.ElementTree.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset