Dumping and loading with CSV

The csv module encodes and decodes simple list or dict instances into a CSV notation. As with the json module, discussed previously, this is not a very complete persistence solution. The wide adoption of CSV files, however, means that it often becomes necessary to convert between Python objects and CSV.

Working with CSV files involves a manual mapping between potentially complex Python objects and very simplistic CSV structures. We need to design the mapping carefully, remaining cognizant of the limitations of the CSV notation. This can be difficult because of the mismatch between the expressive powers of objects and the tabular structure of a CSV file.

The content of each column of a CSV file is, by definition, pure text. When loading data from a CSV file, we'll need to convert these values into more useful types inside our applications. This conversion can be complicated by the way spreadsheets perform unexpected type coercion. We might, for example, have a spreadsheet where US zip codes have been changed into floating-point numbers by the spreadsheet application. When the spreadsheet saves to CSV, the zip codes could become a confusing numeric value. Bangor, Maine, for example, has a zip code of 04401. This becomes 4401 when converted into a number by a spreadsheet program.

Consequently, we might need to use a conversion such as row['zip'].zfill(5) or ('00000'+row['zip'])[-5:] to restore the leading zeroes. Also, don't forget that a file might have a mixture of ZIP and ZIP and four postal codes, making this data cleansing even more challenging.

To further complicate working with CSV files, we have to be aware that they're often touched manually and are become subtly incompatible because of human tweaks. It's important for software to be flexible in the face of real-world irregularities that arise.

When we have relatively simple class definitions, we can often transform each instance into a simple, flat row of data values. Often, NamedTuple is a good match between a CSV source file and Python objects. Going the other way, we might need to design our Python classes around NamedTuple if our application will save data in the CSV notation.

When we have classes that are containers, we often have a difficult time determining how to represent structured containers in flat CSV rows. This is an impedance mismatch between object models and the flat normalized tabular structure used for CSV files or relational databases. There's no good solution for the impedance mismatch; it requires careful design. We'll start with simple, flat objects to show you some CSV mappings.

Let's see how to dump simple sequences into CSV.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset