© Adam L. Davis 2020
A. L. DavisModern Programming Made Easyhttps://doi.org/10.1007/978-1-4842-5569-8_20

20. Databases

Adam L. Davis1 
(1)
Oviedo, FL, USA
 

Databases are an extremely important component of most software projects. In short, a database is a software system that stores data in a standardized format, and depending on the database, it might enable one or more of the following: quick storage and retrieval, the ability to perform complex queries, validation of data input, and calculations on the data.

The classic style of database going back many decades is known as a relational database. In addition to raw data, it stores relationships between tables in the database. A database typically consists of several highly structured data tables with defined constraints. For example, each column of a table has a type, whether or not it can be null, if it must be unique, and other constraints.

There is a highly standardized language for performing operations and calculations on a database called SQL (Structured Query Language). SQL has been around a long time and could easily warrant its own book, so this book will only cover the basics.

Since the advent of so-called big-data products and web applications (such as a particular “face”-themed social network), a second category of databases has emerged: NoSQL or non-relational databases. Typically these are more like key-value or document stores than relational databases. They include databases like Redis, MongoDB, Cassandra, DynamoDB, and many others.

Note

The SQL/NoSQL categorization is an oversimplification, but it provides an easier narrative than the actual complex reality. In other words, “Here be dragons!”

SQL (Relational) Databases

Part of classic relational databases is the concept of ACID1 (atomicity, consistency, isolation, durability). To summarize ACID, it means that the database is always in a consistent state (with enforced constraints), even if the system crashes in the middle of an update. For example, if a column in a table is marked as “not null,” it will never be null. It also enables transactions, which are atomic units of work—either all of it happens or none of it. This may seem at first glance like a simple thing to achieve, but it is actually a very complex problem.

In relational databases the main store of data is called a table. A table has columns which are part of the table schema (definition) and define what kind of data specifically is stored. A table has rows, which is data stored in the table, and each row has values defined for each column.

../images/435475_2_En_20_Chapter/435475_2_En_20_Figa_HTML.jpg Some good open source databases include PostgreSQL, MySQL, and H2. For this section, you can follow along by installing PostgreSQL. On Debian or Ubuntu Linux type “sudo apt install postgresql”. For other systems, see the web site2 for installation instructions. A good graphical tool for connecting to and manipulating databases is DBeaver.3

SQL

The basic language of relational databases is SQL. It includes the ability to define tables and perform complex queries on those tables.

For example, creating a table looks something like the following:
1   CREATE TABLE dragon(
2       dragon_id INTEGER,
3       dragon_name VARCHAR(100),
4       birth_date DATE NOT NULL,
5       PRIMARY KEY (dragon_id)
6      );

A table always needs to have a primary key—it acts as the identifier for each row of the table so it must be unique per row in the table. In this case, the primary key is dragon_id.

Next, you can add data to the table using insert statements. For example, in PostgreSQL you can insert two rows in the following way:
insert into dragon values (1, 'Smaug', current_date);
insert into dragon values (2, 'Drago', current_date);
Database types cover the basics, such as INTEGER , but other unfamiliar types include the following:
  • VARCHAR(length) is similar to the String object. It has a given maximum length.

  • TIMESTAMP is used to store dates and times.

  • NUMERIC(precision, scale) or DECIMAL(precision, scale) is used to store numbers such as currency values (e.g., the number 123.45 has a precision of 5 and a scale of 2).

  • BLOB is typically used to store binary data.

The select statement allows you to specify which columns you want to extract from one or more tables. You can also use aggregation functions like MIN, MAX, or COUNT within a select statement to perform more complex queries. For example, to find the birthday of your oldest dragon, you might perform the following query:
1   SELECT MIN(birth_date) FROM dragon;
A where clause can be used to restrict your query to certain rows in a table. To select all dragons whose names start with S (in alphabetic order), run the following:
1   SELECT dragon_id, dragon_name FROM dragon
2       WHERE dragon_name LIKE 'S%'
3       ORDER BY dragon_name;

The order by clause is used to order the results returned from the query. The like keyword is used to match a varchar column against a matching expression where % matches any values.

Foreign Keys

A foreign key is simply a column in a table that references the primary key of another table.

For example, let’s say you have a wizard table, and each wizard has multiple dragons they keep as pets.
1   CREATE TABLE wizard(
2       wizard_id INTEGER,
3       wizard_name VARCHAR(100),
4       PRIMARY KEY (wizard_id)
5      );
If the wizard table’s primary key is wizard_id, the dragon table could have the following new definition with the owner column and foreign key constraint:
1   CREATE TABLE dragon(
2       dragon_id INTEGER,
3       dragon_name VARCHAR(100),
4       birth_date DATE NOT NULL,
5       PRIMARY KEY (dragon_id)
6       owner INTEGER,
7       FOREIGN KEY owner REFERENCES wizard (wizard_id)
8      );

../images/435475_2_En_20_Chapter/435475_2_En_20_Figb_HTML.jpg Although SQL keywords are shown in uppercase, this is not required by PostgreSQL but is only for illustration purposes.

Connections

A database system typically runs as a separate process, and your code connects to it in some way.

There are many different ways to do this. In Java, the most basic standard for connecting to databases is called JDBC.

It allows you to run SQL statements on the database. You will need a specific Driver—a library that implements the JDBC standard—for your database.

In a real-world application, you should also use a JDBC connection pool such as HikariCP.4 A connection pool allows connections to be reused by your application many times, which increases throughput and performance of your application since connections take time to start up.

There are also object-relational mapping (ORM) frameworks, such as Hibernate.5 These frameworks have you map Java objects to data tables. They are built on top of JDBC. For example, Hibernate has its own query language, called HQL, which is translated into SQL by Hibernate. GORM, which we discussed earlier, uses Hibernate by default.

Alternatively, there are code-generating frameworks that allow you to use a DSL for queries. One such framework for Java is jOOQ.6 It allows you to write type-safe queries in the native language. For example:
1   create.selectFrom(DRAGON)
2     .where(DRAGON.NAME.like("S%"))
3     .orderBy(DRAGON.NAME)

NoSQL (Non-relational) Databases

Big web projects (such as Wikipedia) had problems using relational databases to scale up to millions of users. They had to partition their database onto multiple machines (called sharding ), which broke foreign key references. There is a theorem about this, CAP theorem which says you can have two of consistency, availability, and partitioning in a database, but not all three. So, over time, big-data projects moved to NoSQL or non-relational databases, which make different trade-offs so they can be scaled up more easily. Many times the trade-off is eventual consistency rather than full consistency. In other words, one user might read an old value for a short period of time after another user input a newer value.

NoSQL databases are used by Netflix, Reddit, Twitter, GitHub, Pinterest, eBay, eHarmony, craigslist, and many others.

Note

I will cover some NoSQL databases here, but there are many others.

Redis

Redis 7 is a key-value store. Everything is stored as a string in Redis, including binary data. It’s written in C and has a long list of commands.8

There are multiple clients for using Redis from many different languages, including Java, Node.js, Scala, Ruby, Python, and Go.

MongoDB

MongoDB 9 is a document database. It stores JSON-style (JavaScript) documents and has a rich query syntax. It’s written in C++, but JavaScript can be used in queries and aggregation functions.

MongoDB supports indexing of any field in a document. It scales horizontally using sharding and provides high availability and increased throughput using replication. More recently it added support for ACID transactions.

MongoDB can also be used as a file system.

Cassandra

Cassandra 10 was originally developed at Facebook and was released as an open source project in July 2008. It’s written in Java and is now a mature, top-level Apache project.

Cassandra is scalable, decentralized, fault tolerant, and has tunable consistency. It also uses replication for fault tolerance and performance.

Cassandra has an SQL-like alternative called CQL (Cassandra Query Language). Language drivers are available for Java (JDBC), Python (DBAPI2), and Node.JS (Helenus).

VoltDB

VoltDB11 provides a counter-example to the SQL/NoSQL divide. It’s distributed, in-memory, and lightning-fast, but it’s also a relational database and supports SQL.

Summary

  • There are two major types of databases: SQL and NoSQL or relational and non-relational.

  • Relational (SQL) databases are highly structured, consistent, and durable, but difficult to scale up.

  • Big-data projects tend to use non-relational databases, which are like key-value stores that can be scaled up more easily.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset