Chapter 3. PostgreSQL Basic Building Blocks

In this chapter, we will build a PostgreSQL database, and explore the basic building blocks of PostgreSQL. The conceptual model of a car web portal, which was presented in Chapter 1, Relational Databases, will be translated to a physical model. Also, some data modeling techniques, such as surrogate keys, will be discussed briefly and some coding best practices will be presented.

We will also take a look at the hierarchy of the database objects in PostgreSQL. This will help you to understand how to configure the database cluster and tune its settings. More detailed information will be presented to show the usage of template databases, user databases, roles, table spaces, schemas, configuration settings, and tables.

Database coding

The software engineering principles should be applied on database coding. Some of these principles are:

Database naming conventions

A naming convention describes how names are to be formulated. Naming conventions allow some information to be derived based on patterns, which helps the developer to easily search for and predict the database object names. Database naming conventions should be standardized across the organization. There is a lot of debate on how to name database objects. For example, some developers prefer to have prefixes or suffixes to distinguish the database object type from the names. For example, one could suffix a table or a view with tbl and vw respectively.

With regard to database object names, one should try to use descriptive names, and avoid acronyms and abbreviations if possible. Also, singular names are preferred, because a table is often mapped to an entity in a high programming language; thus, singular names lead to unified naming across the database tier and the business logic tier. Furthermore, specifying the cardinality and participation between tables is straightforward when the table names are singular.

In the database world, compound object names often use underscore but not camel case due to the ANSI SQL standard specifications regarding identifiers quotation and case sensitivity. In the ANSI SQL standard, non-quoted identifiers are case-insensitive.

In general, it is up to the developer to come up with a naming convention that suits his needs; in existing projects, do not invent any new naming conventions, unless the new naming conventions are communicated to the team members. In this book, we use the following conventions:

  • The names of tables and views are not suffixed
  • The database object names are unique across the database
  • The identifiers are singulars including table, view, and column names
  • Underscore is used for compound names
  • The primary key is composed of the table name and the suffix "id"
  • A foreign key has the same name of the referenced primary key in the linked table
  • Use the internal naming conventions of PostgreSQL to rename the primary keys, foreign keys, and sequences

Do not use keywords to rename your database objects. The list of SQL keywords can be found at http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html.

PostgreSQL identifiers

The length of PostgreSQL object names is 63 characters; PostgreSQL also follows ANSI SQL regarding case sensitivity. If you wanted to use the camel case for renaming database objects, you could achieve that by putting the identifier name in double quotes. PostgreSQL identifier names have the following constraints:

  • The identifier name should start with an underscore or a letter. Letters can be Latin or non-Latin letters.
  • The identifier name can be composed of letters, digits, underscore, and the dollar sign. For compatibility reasons, the use of the dollar sign is not recommended.
  • The minimum length of the identifier is typically one, and the maximum length is 63.

In addition to the preceding points, it is not recommended to use keywords as table names.

Documentation

Documentation is essential for developers as well as business owners to understand the full picture. Documentation for database schema, objects, and code should be maintained. ER and class diagrams are very useful in understanding the full picture. There are tons of programs that support UML and ER diagrams. One can generate ER and UML diagrams by using graph editing tools such as yEd, which can be downloaded from http://www.yworks.com/en/products/yfiles/yed/. Another useful tool is SchemaSpy (http://schemaspy.sourceforge.net/), which generates a visual representation for table structures and the relation between tables. Also, there are many commercial UML modeling tools that support code reverse engineering.

Code documentation provide an insight into complex SQL statements. PostgreSQL uses -- and /**/ for single-line and multi-line comments respectively. The single line comment -- works on the rest of the line after the comment marker. Therefore, it can be used on the same line as the actual statement.

Finally, PostgreSQL allows the developer to store the database object description via the COMMENT ON command.

Version control system

It is a good idea to maintain your code using a revision control system such as GIT or SVN. When writing an SQL code, it is better to create an installation script and execute it in one transaction. This approach makes it easy to clean up if an error occurs.

Database objects have different properties: some are a part of the physical schema, and some control database access. The following is a proposal for organizing the database code in order to increase the separation of concern (SoC):

  • For each database in a PostgreSQL cluster, one should maintain the DDL script for objects that are part of the physical schema, and the DML script, which populates the tables with static data together. The state of an object in the physical schema is defined by the object structure and the data that is contained by this object; thus, the object cannot be recreated without being dropped first. Also, the structure of the physical schema object does not change often. In addition to that, the refactoring of some of the physical schema objects, such as tables, might require data migration. In other words, changing the definition of a physical schema object requires some planning.
  • Store the DDL scripts for objects that are not part of the physical schema, such as views and functions, separately. Keeping the definitions of views and functions together allows the developer to refactor them easily. Also, the developer will be able to extract the dependency trees between these objects.
  • Maintain the DCL script separately. This allows the developer to separate the security aspect from the functional requirements of the database. It allows the database developers and administrators to work closely without interfering with each other's work.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset