Handling containers and complex classes

When we look back at our microblog example, we have a Blog object that contains many Post instances. We designed Blog as a wrapper around list, so that the Blog would contain a collection. When working with a CSV representation, we have to design a mapping from a complex structure to a tabular representation. We have three common solutions:

We can create two files, a blog file and a posting file. The blog file has only the Blog instances. Each Blog has a title in our example. Each Post row can then have a reference to the Blog row that the posting belongs to. We need to add a key for each Blog. Each Post would then have a foreign key reference to the Blog key.
We can create two kinds of rows in a single file. We will have the Blog rows and Post rows. Our writers entangle the various types of data; our readers must disentangle the types of data.
We can perform a relational database join between the various kinds of rows, repeating the Blog parent information on each Post child.

There's no best solution among these choices. We have to design a solution to cope with an impedance mismatch between flat CSV rows and more structured Python objects. The use cases for the data will define some of the advantages and disadvantages.

Creating two files requires that we create some kind of unique identifier for each Blog so that a Post can properly refer to the Blog. We can't easily use the Python internal ID, as these are not guaranteed to be consistent each time Python runs.

A common assumption is that the Blog title is a unique key; as this is an attribute of Blog, it is called a natural primary key. This rarely works out well; we cannot change a Blog title without also updating all of the Posts that refer to the Blog. A better plan is to invent a unique identifier and update the class design to include that identifier. This is called a surrogate key. The Python uuid module can provide unique identifiers for this purpose.

The code to use multiple files is nearly identical to the previous examples. The only change is to add a proper primary key to the Blog class. Once we have the keys defined, we can create writers and readers as shown previously to process the Blog and Post instances into their separate files.

In the next section, we'll dump and load multiple row types into a CSV file.

Table of Contents for Handling containers and complex classes

Create new playlist

Sign In

Sign Up

Table of Contents for
Handling containers and complex classes