Dumping and loading joined rows into a CSV file

Joining the objects together means creating a collection where each row has a composite set of columns. The columns will be a union of the child class attributes and the parent class attributes. The file will have a row for each child. The parent attributes of each row will repeat the parent attribute values for the parent of that child. This involves a fair amount of redundancy, since the parent values are repeated with each individual child. When there are multiple levels of containers, this can lead to large amounts of repeated data.

The advantage of this repetition is that each row stands alone and doesn't belong to a context defined by the rows above it. We don't need a class discriminator. The parent values are repeated for each child object.

This works well for data that forms a simple hierarchy; each child has some parent attributes added to it. When the data involves more complex relationships, the simplistic parent-child pattern breaks down. In these examples, we've lumped the Post tags into a single column of text. If we tried to break the tags into separate columns, they would become children of each Post, meaning that the text of Post might be repeated for each tag. Clearly, this isn't a good idea!

The CSV column titles must be a union of all the available column titles. Because of the possibility of name clashes between the various row types, we'll qualify each column name with the class name. This will lead to column titles such as 'Blog.title' and 'Post.title'. This allows for the use of DictReader and DictWriter rather than the positional assignment of the columns. However, these qualified names don't trivially match the attribute names of the class definitions; this leads to somewhat more text processing to parse the column titles. Here's how we can write a joined row that contains parent as well as child attributes:

with (Path.cwd() / "data" / "ch10_blog5.csv").open("w", newline="") as target:
wtr = csv.writer(target)
wtr.writerow(
["Blog.title", "Post.date", "Post.title", "Post.tags", "Post.rst_text"]
)
for b in blogs:
for p in b.entries:
wtr.writerow([b.title, p.date, p.title, p.tags, p.rst_text])

We saw qualified column titles. In this format, each row now contains a union of the Blog attribute and the Post attributes. We can use the b.title and p.title attributes to include the blog title on each posting.

This data file layout is somewhat easier to prepare, as there's no need to fill unused columns with None. Since each column name is unique, we can easily switch to a DictWriter instead of a simple csv.writer().

Rebuilding the blog entry becomes a two-step operation. The columns that represent the parent and Blog objects must be checked for uniqueness. The columns that represent the child and Post objects are built in the context of the most-recently-found parent. Here's a way to reconstruct the original container from the CSV rows:

def blog_iter2(source: TextIO) -> Iterator[Blog]:
rdr = csv.DictReader(source)
assert (
set(rdr.fieldnames)
== {"Blog.title", "Post.date", "Post.title", "Post.tags", "Post.rst_text"}
)
# Fetch first row, build the first Blog and Post.
row = next(rdr)
blog = blog_builder5(row)
post = post_builder5(row)
blog.append(post)

# Fetch all subsequent rows.
for row in rdr:
if row["Blog.title"] != blog.title:
yield blog
blog = blog_builder5(row)
post = post_builder5(row)
blog.append(post)
yield blog

The first row of data is used to build a Blog instance and the first Post in that Blog. The invariant condition for the loop that follows assumes that there's a proper Blog object. Having a valid Blog instance makes the processing logic much simpler. The Post instances are built with the following function:

import ast

def post_builder5(row: Dict[str, str]) -> Post:
return Post(
date=datetime.datetime.strptime(
row["Post.date"],
"%Y-%m-%d %H:%M:%S"),
title=row["Post.title"],
rst_text=row["Post.rst_text"],
tags=ast.literal_eval(row["Post.tags"]),
)

We mapped the individual columns in each row through a conversion to the parameters of the constructor class. This properly handles all of the type conversions from the CSV text into Python objects.

The blog_builder5() function is similar to the post_builder5() function. Since there are fewer attributes and no data conversion involved, it's not shown, and is left as an exercise for the reader.

Let's see how to dump and load using XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset