Design considerations and tradeoffs

There are many ways to serialize and persist Python objects. We haven't seen all of them yet. The formats in this section are focused on two essential use cases:

  • Data interchange with other applications: We might be publishing data for other applications or accepting data from other applications. In this case, we're often constrained by the other applications' interfaces. Often, JSON and XML are used by other applications and frameworks as their preferred form of data interchange. In some cases, we'll use CSV to exchange data.
  • Persistent data for our own applications: In this case, we're usually going to choose pickle because it's complete and is already part of the Python Standard Library. However, one of the important advantages of YAML is its readability; we can view, edit, and even modify the file.

When working with each of these formats, we have a number of design considerations. First and foremost, these formats are biased towards serializing a single Python object. It might be a list of other objects, but it is essentially a single object. JSON and XML, for example, have ending delimiters that are written after the serialized object. For persisting individual objects from a larger domain, we can look at shelve and sqlite3 in Chapter 11, Storing and Retrieving Objects via Shelve, and Chapter 12, Storing and Retrieving Objects via SQLite.

JSON is a widely-used standard, but it's inconvenient for representing complex Python classes. When using JSON, we need to be cognizant of how our objects can be reduced to a JSON-compatible representation. JSON documents are human-readable. JSON's limitations make it potentially secure for the transmission of objects through the internet.

YAML is not as widely used as JSON, but it solves numerous problems in serialization and persistence. YAML documents are human-readable; for editable configuration files, YAML is ideal. We can make YAML secure using the safe-load options.

Pickle is ideal for the simple, fast, local persistence of Python objects. It is a compact notation for the transmission from Python to Python. CSV is a widely-used standard. Working out representations for Python objects in CSV notation is challenging. When sharing data in CSV notation, we often end up using NamedTuple objects in our applications. We have to design a mapping from Python to CSV and CSV to Python.

XML is another widely-used notation for serializing data. XML is extremely flexible, leading to a wide variety of ways to encode Python objects in XML notation. Because of the XML use cases, we often have external specifications in the form of an XSD or DTD. The process for parsing XML to create Python objects is always rather complex.

Because each CSV row is largely independent of the others, CSV allows us to encode or decode extremely large collections of objects. For this reason, CSV is often handy for encoding and decoding gargantuan collections that can't fit into memory.

In some cases, we have a hybrid design problem. When reading most modern spreadsheet files, we have the CSV row-and-column problem wrapped in the XML parsing problem. Consider, for example, OpenOffice. ODS files are zipped archives. One of the files in the archive is the content.xml file. Using an XPath search for body/spreadsheet/table elements will locate the individual tabs of the spreadsheet document. Within each table, we'll find the table-row elements that (usually) map to Python objects. Within each row, we'll find the table-cell elements that contain the individual values that build up the attributes of an object.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset