The SQL data model – rows and tables

Conceptually, the SQL data model is based on named tables with named columns. A table definition is a flat list of columns with no other structure to the data. Each row is essentially a mutable @dataclass. The idea is to imagine the contents within a table as a list of individual @dataclass objects. The relational model can be described by Python type hints as if it had definitions similar to the following:

from dataclasses import dataclass
from typing import Union, Text
import datetime
SQLType = Union[Text, int, float, datetime.datetime, datetime.date, bytes]
@dataclass
class Row:
column_x: SQLType
...
Table = Union[List[Row], Dict[SQLType, Row]]

From the type hints, we can see that a database Table can be viewed as a list of Row instances, or a Dict mapping a column to a Row instance. The definition of a Row is a collection of SQLType column definitions. There is much more to the relational model, but this tiny overview suggests ways we'll use a database in Python. An SQL database has relatively few atomic data types. Generally, the only data structure available is the Table, which is a collection of rows. Each Row is a simple list of individual column definitions. The data within a Table can be used as a simple list of Row objects. When we select a column to be a key and the remaining columns to be values, we can consider the table to behave like a dictionary or mapping from key to row value. We didn't clutter up the conceptual type definitions with additional details like multi-column keys, and the nullability of column values.

More complex data structures are built by having rows in one table contain references to a row in another table. These references are built around simply having common values shared by separate tables. When a row's key in one table is referenced by rows in another table, we've effectively defined a hierarchical, one-to-many relationship; this implements a Python nested list or nested dictionary. We can define many-to-many relationships using an intermediate association table, which provides a list of key pairs from each of the associated tables. A set can be built with a table that has a unique, primary key to assure only unique values are collected into the table.

When we define an SQL database, we define a collection of tables and their columns. When we use an SQL database, we manipulate the rows of data collected into the tables.

In the case of SQLite, we have a narrow domain of data types that SQL will process. SQLite handles NULL, INTEGER, REAL, TEXT, and BLOB data.

The Python types of None, int, float, str, and bytes are mapped to these SQL types. Similarly, when data of these types is fetched from an SQLite database, the items are converted into Python objects.

The BLOB type is a binary large object, a collection of bytes of a type defined outside of SQL. This can be used to introduce Python-specific data types into an SQL database. SQLite allows us to add conversion functions for encoding and decoding Python objects into bytes. The sqlite3 module has already added the datetime.date and datetime.datetime extensions this way for us. We can add more conversion functions for more specialized processing.

The SQL language can be partitioned into three sublanguages: the data definition language (DDL), the data manipulation language (DML), and the data control language (DCL). The DDL is used to define tables, their columns, and indices. For an example of DDL, we might have some tables defined the following way:

CREATE TABLE blog( 
    id INTEGER PRIMARY KEY AUTOINCREMENT, 
    title TEXT
); CREATE TABLE post( id INTEGER PRIMARY KEY AUTOINCREMENT, date TIMESTAMP, title TEXT, rst_text TEXT, blog_id INTEGER REFERENCES blog(id)
); CREATE TABLE tag( id INTEGER PRIMARY KEY AUTOINCREMENT, phrase TEXT UNIQUE ON CONFLICT FAIL
); CREATE TABLE assoc_post_tag( post_id INTEGER REFERENCES post(id), tag_id INTEGER REFERENCES tag(id)
);

We've created four tables to represent the Blog and Post objects for a microblogging application. For more information on the SQL language processed by SQLite, see http://www.sqlite.org/lang.html. For a broader background in SQL, books such as Creating your MySQL Database: Practical Design Tips and Techniques will introduce the SQL language in the context of the MySQL database. The SQL language is case insensitive.

For no good reason, we prefer to see SQL keywords in all uppercase to distinguish it from the surrounding Python code.

The blog table defines a primary key with the AUTOINCREMENT option; this will allow SQLite to assign the key values, saving us from having to generate the keys in our code. The title column is the title for a blog. We've defined it to be TEXT. In some database products, we must provide a maximum size for a character string. This can help the database engine optimize storage. Since this size is not required in SQLite, so we'll avoid the clutter.

The post table defines a primary key as well as date, title, and RST text for the body of the post. Note that we did not reference the tags for a post in this table definition. We'll return to the design patterns required for the following SQL tables. The post table does, however, include a formal REFERENCES clause to show us that this is a foreign key reference to the owning blog.

The tag table defines the individual tag text items, and nothing more. The SQL defines the text column as being unique. An attempt to insert a duplicate will cause the transaction to fail.

Finally, we have an association table between post and tag. This table has only two foreign keys. It associates tags and posts, allowing an unlimited number of tags per post as well as an unlimited number of posts to share a common tag. This association table is a common SQL design pattern to handle this kind of many-to-many relationship. We'll look at some other SQL design patterns in the following section. We can execute the preceding definitions to create our database:

import sqlite3 
database = sqlite3.connect('p2_c11_blog.db') 
database.executescript(sql_ddl) 

All database access requires a connection, created with the module function, sqlite3.connect(). We provided the name of the file to assign to our database.

The DB-API standard for Python database modules is defined by PEP 249, available at https://www.python.org/dev/peps/pep-0249/. This standard presumes that there is a separate database server process to which our application process will connect. In the case of SQLite, there isn't really a separate process. A connect() function is used, however, to comply with the standard. The sql_ddl variable in the previous example is simply a long string variable with the four CREATE TABLE statements. If there are no error messages, then it means that the table structures have been defined.

The executescript() method of a Connection object is described in the Python standard library as a nonstandard shortcut. Technically, database operations must involve creating a cursor objectThe following is a standardized approach:

from contextlib import closing
with closing(database.cursor()) as cursor: for stmt in sql_ddl.split(";"): cursor.execute(stmt)

While conceptually helpful, this isn't practical. It doesn't work well when there are ";" characters in comments or text literals. It's better to use the handy executescript() shortcut. If we were concerned about the portability to other databases, we'd shift focus to a more strict compliance with the Python DB-API specification. We'll return to the nature of a cursor object in the following section, when looking at queries.

In the next section, we'll discuss CRUD processing via SQL DML statements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset