Dumping and loading multiple row types into a CSV file

Creating multiple kinds of rows in a single file makes the format a bit more complex. The column titles must become a union of all the available column titles. Because of the possibility of name clashes between the various row types, we can either access rows by position (preventing us from simply using csv.DictReader) or we must invent a more sophisticated column title that combines class and attribute names.

The process is simpler if we provide each row with an extra column that acts as a class discriminator. This extra column shows us what type of object the row represents. The object's class name would work well for this. Here's how we might write blogs and posts to a single CSV file using two different row formats:

with (Path.cwd() / "data" / "ch10_blog3.csv").open("w", newline="") as target:
    wtr = csv.writer(target)
    wtr.writerow(["__class__", "title", "date", "title", "rst_text", "tags"])
    for b in blogs:
        wtr.writerow(["Blog", b.title, None, None, None, None])
        for p in b.entries:
            wtr.writerow(["Post", None, p.date, p.title, p.rst_text, p.tags])

We created two varieties of rows in the file. Some rows have 'Blog' in the first column and contain just the attributes of a Blog object. Other rows have 'Post' in the first column and contain just the attributes of a Post object.

We did not make the column titles unique, so we can't use dictionary writers or readers. When allocating columns by position like this, each row allocates unused columns based on the other types of rows it must coexist with. These additional columns are filled with None. As the number of distinct row types grows, keeping track of the various positional column assignments can become challenging.

Also, the individual data type conversions can be somewhat baffling. In particular, we've ignored the data type of the timestamp and tags. We can try to reassemble our Blogs and Posts by examining the row discriminators:

with (Path.cwd() / "data" / "ch10_blog3.csv").open() as source:
    rdr = csv.reader(source)
    header = next(rdr)
    assert header == ["__class__", "title", "date", "title", "rst_text", "tags"]
    blogs = []
    for r in rdr:
        if r[0] == "Blog":
            blog = Blog(*r[1:2])  # type: ignore
            blogs.append(blog)
        elif r[0] == "Post":
            post = Post(*r[2:])  # type: ignore
            blogs[-1].append(post)

This snippet will construct a list of Blog objects. Each 'Blog' row uses columns in slice(1,2) to define the Blog object. Each 'Post' row uses columns in slice(2,6) to define a Post object. This requires that each Blog be followed by the relevant Post instances. A foreign key is not used to tie the two objects together.

We've used two assumptions about the columns in the CSV file that has the same order and type as the parameters of the class constructors. For Blog objects, we used blog = Blog(*r[1:2]) because the one-and-only column is text, which matches the constructor class. When working with externally-supplied data, this assumption might prove to be invalid.

The # type: ignore comments are required because the data types from the reader will be strings and those types don't match the dataclass type definitions provided above. Subverting mypy checks to construct objects isn't ideal.

To build the Post instances and perform the appropriate type conversion, a separate function is required. This function will map the types and invoke the constructor class. Here's a mapping function to build Post instances:

import ast

def post_builder(row: List[str]) -> Post:
    return Post(
        date=datetime.datetime.strptime(row[2], "%Y-%m-%d %H:%M:%S"),
        title=row[3],
        rst_text=row[4],
        tags=ast.literal_eval(row[5]),
    )

This will properly build a Post instance from a row of text. It converts the text for datetime and the text for the tags into their proper Python types. This has the advantage of making the mapping explicit.

In this example, we're using ast.literal_eval() to decode more complex Python literal values. This allows the CSV data to include the literal representation of a tuple of string values: "('#RedRanger', '#Whitby42', '#ICW')". Without using ast.literal_eval(), we'd have to write our own parser for the rather complex regular expression around this data type. Instead of writing our own parser, we elected to serialize a tuple-of-string object that could be deserialized securely.

Let's see how to filter CSV rows with an iterator.

Table of Contents for Dumping and loading multiple row types into a CSV file

Create new playlist

Sign In

Sign Up

Table of Contents for
Dumping and loading multiple row types into a CSV file