Appendix B. Reviewing relational databases

Relational databases have been an integral part of enterprise development for a few decades now. The fact that these business data storage mainstays are backed by their own body of mathematical theory (relational algebra) speaks to the elegance and robustness of this mature technology. E. F. Codd first introduced the theory of relational databases in 1970 while working at IBM. This groundbreaking research eventually led to the creation of today’s database products, including IBM’s own highly successful DB2 database. Oracle is the most popular database in existence today, in vibrant competition with products like Microsoft SQL Server, Sybase, MySQL, and many others, in addition to IBM’s DB2. Fundamentally, relational databases store and organize related data into a hierarchy of schemas, tables, columns, and rows.

Other types of databases exist, including flat-file, hierarchical, network, and object-oriented databases. Each of these is worthy of study on its own merits. However, the EJB 3 specification only supports relational databases, and that will be where we draw the line with regard to the database discussions in this book. The focus of this appendix is to briefly discuss each of the relational concepts.

B.1. Database tables, columns, rows, and schema

Tables are the most basic logical unit in a relational database. A table stores conceptually related data into rows and columns. Essentially, tables are the object-oriented (OO) counterparts of objects. Hence, we might imagine that the ActionBazaar database contains tables like CATEGORIES, ITEMS, ORDERS, and so forth. A column is a particular domain of data, and a table is a set of related columns. If tables are the equivalent of objects, columns are the equivalent of object attributes. Consequently, the CATEGORIES table probably has columns such as CATEGORY_ID, CATEGORY_NAME, MODIFICATION_DATE, and CREATION_DATE, as seen in figure B.1.

Figure B.1. Rows and columns in the CATEGORIES table. While columns store a domain of data, rows contain a record composed of a set of related columns in a table.

As with object attributes in Java, each relational table column has a data type. Table B.1 lists some column data types commonly used across various databases and their Java equivalents.

Table B.1. Common column data types and their Java equivalents

Relational Database Type

Java Type

CHAR, VARCHAR2, VARCHAR, LONG

java.lang.String

Char

char, Char

INTEGER, NUMBER

int, Integer, BigInteger

NUMBER

double, float, BigDecimal, Double, Float

Raw, BLOB

java.sql.Blob, byte[]

CLOB

java.sql.Clob, char[],java.lang.String

A row is a record saved in the database composed of related data in each column of a table. A row, in effect, is equivalent to an instance of a particular object, in contrast to the class definition. For most OO developers it’s not a big leap to imagine an instance of the Category object being saved into a row of the CATEGORIES table.

A schema can be compared to a Java package. In other words, a schema is a collection of related tables, similar to how a Java package contains a set of related classes. Usually, all of the tables used in a particular application are organized under a single schema. All the tables used in our example application might be stored under a schema called ACTIONBAZAAR.

Typically, a schema stores much more than just tables. It might also have views, triggers, and stored procedures. A detailed discussion of these database features is beyond the scope of this appendix. For coverage of these and other database topics, feel free to investigate a good reference book such as An Introduction to Database Systems, 7th edition, by C. J. Date (Addison Wesley Longman, 1999).

We will, however, cover a few more database concepts essential in understanding EJB 3 Persistence next, namely database constraints such as primary and foreign keys.

B.2. Database constraints

Constraints are the concept that is closest to business rules in a basic relational database schema. In effect, constraints maintain data integrity by enforcing rules on how data may be modified. Since most database vendors try to differentiate their products by offering unique constraint features, coming up with a list of constraints to discuss is not easy. We have chosen to cover the bare minimum set necessary to understand EJB 3 persistence features, namely primary/foreign keys, uniqueness constraints, NULL constraints, and sequence columns.

B.2.1. Primary keys and unique columns

Just as a set of fields or properties uniquely identifies an entity, a set of columns uniquely identifies a given database record. The column or set of columns identifying a distinct record is called a primary key. For example, the CATEGORY_ID column is the primary key for the CATEGORIES table. When you identify a column or set of columns as the primary key, you essentially ask the database to enforce uniqueness. If the primary key consists of more than one column, it is called a compound or composite key. For example, instead of CATEGORY_ID, the combination of CATEGORY_NAME and CREATION_DATE could be the primary key for the CATEGORIES table.

Primary keys that consist of business data are called natural keys. A classic example is using some business data such as a Social Security number (represented by an SSN column) as the primary key for an EMPLOYEES table. CATEGORY_ID or EMPLOYEE_ID, on the other hand, are examples of surrogate keys. Essentially, surrogate keys are columns created explicitly to function as primary keys. Surrogate keys are popular and we highly recommend using them, especially as opposed to compound keys. Other than naming, primary key and uniqueness constraints do exactly the same thing, and the constraint is usually applied to columns that can function as alternate natural keys.

B.2.2. Foreign key

The interaction of primary and foreign keys is what makes relational databases shine. Foreign keys are essentially primary key values of one table stored in another table. Foreign keys are the database equivalents of object references, and signify relationships between tables. As shown in figure B.2 (from our ActionBazaar example), a column named CATEGORY_ID in the ITEMS table pointing to the CATEGORY_ID column in the CATEGORIES table signifies the fact that an item belongs in a category.

Figure B.2. The CATEGORY_ID foreign key in the ITEMS table points to the primary key of the CATEGORIES table.

A database foreign key constraint means that the database will ensure every value that is put into the foreign key column exists in the primary key column it points to.

B.2.3. Not Null

The NOT NULL constraint is essentially a data integrity mechanism that ensures some table columns always have valid, nonempty values. For example, if the business rules dictate that a Category record must always have a name, we can specify a NOT NULL constraint on the CATEGORY_NAME column, and the database will only allow rows to be inserted where a CATEGORY_NAME is specified. If no CATEGORY_NAME is provided, the database will not allow the row to be inserted.

B.2.4. Sequences

An easy way to ensure uniqueness for surrogate primary keys is to set the key for a new record to a number greater than the last created record. Although you could manage this kind of column yourself, databases provide various mechanisms for managing key sequences. The easiest and most transparent of these mechanisms is an identity column constraint (such as the identity column constraints supported by DB2, Sybase, and SQL Server). When you designate a column as an identity, the database automatically generates a value for you when you create a new record. For example, if the ITEM_ID primary key for the ITEMS table is an identity, when we create a new record we do not specify a primary key value ourselves. Instead, during record creation the database looks at the last row inserted, generates a new value by incrementing the last key, and sets the ITEM_ID value on our behalf.

Some other databases like Oracle don’t support incrementing keys as an internal function of the column, but help you generate keys using an external mechanism called sequences (DB2 supports sequences in addition to identities). Each time you insert a new record, you can ask the sequence to generate a key that you can use in the INSERT statement. A few databases don’t support sequence generation at all, in which case you must implement similar functionality yourself. Fortunately, EJB 3 transparently handles all these situations on your behalf, using the table generator.

B.3. Structured Query Language (SQL)

If relational theory is the bedrock of the relational database, SQL is the crown jewels. Java developers with strong OO roots may find SQL’s verbose syntax and unmistakably relational feel less than ideal. The truth is that even O/R solutions such as the EJB 3 Persistence API generate SQL under the hood. The fact that you use O/R is no excuse not to have a solid understanding of SQL, particularly during debugging and fine-tuning.

SQL (which stands for Structured Query Language) arose as a result of the initial relational research conducted at IBM. The American National Standards Institute (ANSI) has since standardized SQL. Almost all major databases comply with the SQL-92 standard for the most part. Even then, writing portable SQL is a tricky business at best. Luckily, O/R relieves us from this meticulous work to some degree by automatically generating SQL suited to a particular database.

SQL statements include the familiar CREATE, INSERT, DELETE, UPDATE, and, of course, everyone’s favorite, SELECT. As a testament to the power of the SELECT statement, some elements of it have been ported over into the O/R world through EJB-QL (which we cover in chapter 10).


Database normalization

In the relational world, it is extremely important that the same conceptual data not be replicated throughout the database. The importance of avoiding redundancy stems from two facts. First, most databases hold a huge amount of data. For example, it is easy to think that storing department name and location in a table with employee information is no big deal. The problem is that if a thousand employees work in the same department, the department information would be duplicated across a thousand employee table rows! If a department location changes, you would have to accurately update each of the records for the thousand employees who work for the department. Second, this redundancy can easily lead to inconsistency. Both of these problems can be solved by storing a foreign key to the department table (say department ID) in the employee table instead.

Relational theory has formalized the process of checking the database design for redundancy. This process is called database normalization. IBM researchers initially proposed three different levels of normalization: first, second, and third normal form, each consisting of a well-defined, incrementally strict set of rules to check for database fitness. Later, more levels were introduced: BCNF (Boyce-Codd Normal Form), fourth, and fifth normal form. Relational theory recognizes the fact that normalization can lead to trading off speed for space efficiency. Most DBAs go through the process of selective denormalization when faced with tricky performance issues.


Coverage of SQL syntax is well beyond the scope of this appendix. However, at least a basic grasp of SQL is essential to understand chapters 8, 9, and 10. If you don’t already have a working knowledge of SQL, we highly recommend that you investigate it on your own.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset