Chapter 2. PostgreSQL in Action

PostgreSQL (pronounced Post-Gres-Q-L) or postgres is an open source object relational database management system. It emphasizes extensibility, creativity, as well as compatibility. It competes with the major relational database vendors such as Oracle, MySQL, SQL servers, and others. It is used by different sectors including government agencies and the public and private sectors. It is a cross-platform DBMS, and runs on most modern operating systems including Windows, MAC, and Linux flavors. It conforms to SQL, and is ACID compliant.

An overview of PostgreSQL

PostgreSQL (http://www.postgresql.org/) has many rich features. It provides enterprise level services including performance, unique features, and scalability. It has a very supportive community and very good documentation.

PostgreSQL history

The name PostgreSQL is derived from post-ingres database. PostgreSQL history could be summarized as follows:

  • Academia: University of California at Berkeley (UCB)
    • 1977-1985, the Ingres project: Michael Stonebraker created an RDBMS based on the formal relational model.
    • 1986-1994, postgres: Michael Stonebraker created postgres in order to support complex data types and the object relational model.
    • 1995, Postgres95: Andrew Yu and Jolly Chen changed the postgres POSTQUEL query language with an extended subset of SQL
  • Industry
    • 1996, PostgreSQL: Several developers dedicated a lot of labor and time to stabilize postgres95. The first open source version was released on January 29, 1997. With the introduction of new features, enhancements, and due to it being an open source project, the postgres95 name was changed to PostgreSQL.

PostgreSQL begun at version 6, having a very strong starting point due to the advantage of several years of research and development. Being an open source project with a very good reputation, PostgreSQL attracted hundreds of developers. Currently, PostgreSQL has an uncountable number of extensions and a very active community.

The advantages of PostgreSQL

PostgreSQL provides many features that attract developers, administrators, architects, and companies.

Business advantages of PostgreSQL

PostgreSQL is a free open source software (OSS). It is released under the PostgreSQL license, which is similar to the BSD and MIT licenses. The PostgreSQL license is highly permissive, and PostgreSQL is not subject to monopoly and acquisition. This gives the company the following advantages:

  • No associated licensing cost to PostgreSQL.
  • Unlimited number of deployments of PostgreSQL.
  • More profitable business model.
  • PostgreSQL is SQL standards compliant; thus, finding professional developers is not very difficult. PostgreSQL is easy to learn, and porting code from one database vendor to PostgreSQL is cost efficient. Also, the PostgreSQL administrative tasks are easy to automate, thus reducing the staffing cost significantly.
  • PostgreSQL is cross-platform, and it has drivers for all modern programming languages; so, there is no need to change the company policy regarding the software stack in order to use PostgreSQL.
  • PostgreSQL is scalable, and it gives high performance.
  • PostgreSQL is very reliable; it rarely crashes. Also, PostgreSQL is ACID compliant, which means that it can tolerate some hardware failure. In addition to that, it can be configured and installed as a cluster to ensure high availability (HA).

PostgreSQL user advantages

PostgreSQL is very attractive for developers, administrators, and architects. It has rich features that enable developers to perform tasks in an agile way. The following are some of the features that are attractive to developer:

  • A new release almost every year; there have been 23 major releases until now, starting from Postgres95.
  • Very good documentation and an active community enables developers to find and solve problems quickly. The PostgreSQL manual is over 2,500 pages.
  • A rich extension repository enables developers to focus on business logic. Also, it enables developers to meet requirement changes easily.
  • The source code is available free of charge. It can be customized and extended without huge effort.
  • Rich clients and administrative tools enable developers to perform routine tasks such as describing database objects, exporting and importing data, and dumping and restoring databases very quickly.
  • Database administration tasks do not require a lot of time, and can be automated.
  • PostgreSQL can be integrated easily with other database management systems giving the software architecture a good flexibility for implementing software designs.

PostgreSQL applications

PostgreSQL can be used with a variety of applications. The main PostgreSQL application domains can be classified into two categories:

  • Online transactional processing (OLTP): OLTP is characterized by a large amount of CRUD operations, very fast processing of operations, and the maintaining of data integrity in a multi-access environment. Performance is measured in the number of transactions per second.
  • Online analytical processing (OLAP): OLAP is characterized by a small amount of requests, complex queries which involve data aggregation, huge amounts of data from different sources and with different formats, data mining, and historical data analysis.

OLTP is used to model business operations such as customer relationship management (CRM). For example, the car web portal example in Chapter 1, Relational Databases, is an example of an OLTP application. OLAP applications are used for business intelligence, decision support, reporting, and planning. An OLTP database size is relatively small as compared to an OLAP database. OLTP normally follows relational model concepts, such as normalization ,when designing the database, while OLAP has less relation; the schema often has the shape of a star or a snowflake. Finally the data is deformalized.

In the car web portal example, we could have another database to store and maintain all the sellers' and users' historical data to analyze user preferences and seller activities. This database is an example of an OLAP application.

Unlike OLTP, OLAP's main operation is data retrieval. OLAP data is often generated by a process called ETL (extract, transform, and load). ETL is used to load data in to the OLAP database from different data sources and different formats.

PostgreSQL can be used out of the box for OLTP applications. For OLAP, there are many extensions and tools to support it, such as the PostgreSQL COPY command and Foreign Data Wrappers (FDW).

Success stories

PostgreSQL is used in many application domains including communication, medical, geographical, and e-commerce applications. Many companies provide consultation as well as commercial services, such as migrating proprietary RDBMS to PostgreSQL in order to cut off the licensing costs. These companies often influence and enhance PostgreSQL by developing and submitting new features.

The following are a few companies that have used PostgreSQL:

  • Skype uses PostgreSQL to store user chats and activities. Skype has also affected PostgreSQL by developing many tools called Skytools.
  • Instagram is a social networking service that enables its user to share pictures and photos. Instagram has more than 100 million active users.
  • The American chemical society (ACS) uses PostgreSQL to store more than one terabyte of data for the journal archive.

In addition to the companies mentioned in the preceding list, PostgreSQL is used by HP, WMware, and Heroku. PostgreSQL is used by many scientific communities and organizations, such as NASA, due to its extensibility and rich data types.

Forks

There are more than 20 PostgreSQL forks; PostgreSQL extensible APIs make postgres a great candidate for forking. Over the years, many groups forked PostgreSQL and contributed their findings to PostgreSQL. The following is a list of the popular PostgreSQL forks:

Tip

A fork is an independent development of a software project based on another project.

  • HadoopDB is a hybrid between the PostgreSQL RDBMS and MapReduce technologies to target analytical workload.
  • Greenplum is a proprietary DBMS that was built on the foundation of PostgreSQL. It utilizes the shared-nothing and massively parallel processing (MPP) architectures. It is used as a data warehouse and for analytical workloads.
  • The EnterpriseDB advanced server is a proprietary DBMS that provides Oracle with the capability to cap the oracle fees.
  • Postgres-XC (extensible cluster) is a multi-master PostgreSQL cluster based on the shared-nothing architecture. It emphasis write-scalability, and provides the same APIs to applications as PostgreSQL.
  • Vertica is a column-oriented database system that was started by Michael Stonebraker in 2005, and was acquisitioned by HP in 2011. Vertica reused the SQL parser, semantic analyzer, and standard SQL rewrites from the PostgreSQL implementation.
  • Netzza, a popular data warehouse appliances solution, was started as a PostgreSQL fork.
  • Amazon red shift is a popular data warehouse management system based on PostgreSQL 8.0.2. It is mainly designed for OLAP applications.

PostgreSQL architecture

PostgreSQL uses the client/server model, where the client and server programs can be on different hosts. The communication between the client and server is normally done via TCP/IP protocols or via Linux sockets. PostgreSQL can handle multiple connections from a client. A common PostgreSQL program consists of the following operating system processes:

  • Client process or program (frontend): The database frontend application performs a database action. The frontend can be a web server that wants to display a web page or a command-line tool to do maintenance tasks. PostgreSQL provides frontend tools such as psql, createdb, dropdb, and createuser.
  • Server process (backend): The server process manages database files, accepts connections from client applications, and performs actions on behalf of the client. The server process name is postgres.

PostgreSQL forks a new process for each new connection; thus, client and server processes communicate with each other without the intervention of the server main process (postgres), and they have a certain lifetime that is determined by accepting and terminating a client connection.

PostgreSQL abstract architecture

The aforementioned abstract conceptual PostgreSQL architecture gives an overview of the PostgreSQL capabilities and its interaction with the client, and the operating system. The PostgreSQL server could be divided roughly into four subsystems, as follows:

  • Process manager: The process manager manages client connections such as forking and the terminating process.
  • Query processor: When a client sends a query to PostgreSQL, the query is parsed by the parser, and then the traffic cop subsystem determines the query type. A utility query is passed to the utilities subsystem. Select, insert, update, and delete queries are rewritten by the rewriter following which an execution plan is generated by the planner. Finally, the query is executed and the result is returned to the client.
  • Utilities: The utilities subsystem provides a means for maintaining the database such as claiming storage, updating statistics, and exporting and importing data with a certain format and logging.
  • Storage manager: The storage manager handles the memory cache, disk buffers, and storage allocation.

Almost all PostgreSQL components can be configured, including a logger, planner, statistical analyzer, and storage manager. PostgreSQL configuration is governed by the nature of the application, such as OLAP and OLTP.

PostgreSQL abstract architecture

Figure 1: PostgreSQL abstract conceptual architecture

The PostgreSQL community

PostgreSQL has a very cooperative, active, and organized community. In the last 8 years, the PostgreSQL community has published eight major releases. Announcements are brought to the developers via the PostgreSQL weekly newsletter. There are dozens of mailing lists organized into categories such as user, developer, and associations. Examples of user mailing lists are pgsql-general, psql-doc, and psql-bugs. pgsql-general is a very important mailing list for beginners. All non-bugs-related questions regarding PostgreSQL installation, tuning, basic administration, PostgreSQL features, and general discussions are submitted to this list.

The PostgreSQL community runs a blog aggregation service called Planet PostgreSQL (planet.postgresql.org). Several PostgreSQL developers and companies use this service to share their experience and knowledge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset