Selecting from multiple tables

It is possible to select records from several sources at a time. Consider the following examples. There are two tables each having three rows:

car_portal=> SELECT * FROM car_portal_app.a;
a_int | a_text
-------+--------
1 | one
2 | two
3 | three
(3 rows)

car_portal=> SELECT * FROM car_portal_app.b;
b_int | b_text
-------+--------
2 | two
3 | three
4 | four
(3 rows)

When records are selected from both of them, we get the combinations of all their rows:

car_portal=> SELECT * FROM car_portal_app.a, car_portal_app.b;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
1 | one | 2 | two
1 | one | 3 | three
1 | one | 4 | four
2 | two | 2 | two
2 | two | 3 | three
2 | two | 4 | four
3 | three | 2 | two
3 | three | 3 | three
3 | three | 4 | four
(9 rows)

All possible combinations of records from several tables are called Cartesian product and usually it does not make much sense. In most cases, the user is interested in certain combinations of rows, when rows from one table match rows from another table based on some criteria. For example, it may be necessary to select only the combinations when the integer fields of both the tables have equal values. To get this, the query should be changed:

car_portal=> SELECT * FROM car_portal_app.a, car_portal_app.b WHERE a_int=b_int;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
2 | two | 2 | two
3 | three | 3 | three
(2 rows)

This condition a_int=b_int joins the tables. The joining conditions could be specified in the WHERE clause but in most cases it is better to put them into the FROM clause to make it explicit that they are here for joining and not for filtering the result of the join, though there is no formal difference.

The JOIN keyword is used to add join conditions to the FROM clause. The following query has the same logic and the same results as the previous one:

SELECT * FROM car_portal_app.a JOIN car_portal_app.b ON a_int=b_int;

The JOIN condition may be specified using any of these three ways--using the keywords ON, USING, or NATURAL:

<first table> JOIN <second table> ON <condition>

The condition can be any SQL expression returning boolean result. It's not even necessary to include fields of the joined tables:

<first table> JOIN <second table> USING (<field list>)

The join is based on the equality of all the fields specified in the comma separated <field list>. The fields should exist in both tables with the same name. So this syntax can be not flexible enough:

<first table> NATURAL JOIN <second table>

Here, the join is based on the equality of all the fields that have the same name in both tables.

Usage of USING or NATURAL JOIN syntax has a drawback that is similar to the usage of * in the select-list. It is possible to change the structure of the tables, for example, by adding another column or renaming them, in a way that does not make the query invalid but changes the logic of the query. This will cause errors that are very difficult to find.

What if not all rows from the first table can be matched to a row in the second table?

In our example, only rows with integer values 2 and 3 exist in both tables. When we join on the condition a_int=b_int, only those two rows are selected from the tables. The rest of the rows are not selected. This kind of join is called inner join.

It can be shown as a filled area on the diagram, like this:

Inner join

When all the records from one table are selected, regardless of the existence of matching records in the other table, it is called an outer join. There are three types of outer joins. Look at the following diagrams:

Left outer join

If all records are selected from the first table, along with only those records that match the joining condition from the second, it is a left outer join:

Right outer join

When all records from the second table are selected, along with only the matching records from the first table, it is a right outer join:

Full outer join

When all the records from both tables are selected, it is a full outer join.

In SQL syntax, the words inner and outer are optional. Consider the following code examples:

car_portal=> SELECT * FROM car_portal_app.a JOIN car_portal_app.b ON a_int=b_int;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
2 | two | 2 | two
3 | three | 3 | three
(2 rows)

car_portal=> SELECT * FROM car_portal_app.a LEFT JOIN car_portal_app.b ON a_int=b_int;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
1 | one | |
2 | two | 2 | two
3 | three | 3 | three
(3 rows)

car_portal=> SELECT * FROM car_portal_app.a RIGHT JOIN car_portal_app.b ON a_int=b_int;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
2 | two | 2 | two
3 | three | 3 | three
| | 4 | four
(3 rows)

car_portal=> SELECT * FROM car_portal_app.a FULL JOIN car_portal_app.b ON a_int=b_int;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
1 | one | |
2 | two | 2 | two
3 | three | 3 | three
| | 4 | four
(4 rows)

Note that the Cartesian product is not the same as the result of the full outer join. Cartesian product means all possible combinations of all records from the tables without any specific matching rules. Full outer join returns pairs of records when they match the join conditions. The records that do not have a pair in the other table are returned separately. Outer joins return empty values NULL in columns that correspond to the table from where no matching record is found.

As it is possible to query not only tables but also views, functions, and subqueries, it is also possible to join them using the same syntax as is used to join tables:

car_portal=> SELECT *
FROM car_portal_app.a
INNER JOIN (SELECT * FROM car_portal_app.b WHERE b_text = 'two') subq ON a.a_int=subq.b_int;
a_int | a_text | b_int | b_text
-------+--------+-------+--------
2 | two | 2 | two

In the example, the subquery got the alias subq and it was used in the join condition.

It is also possible to join more than two tables. In fact, every join clause joins all the tables before the JOIN keyword with one table right after the keyword.

For example, this is correct:

SELECT *
FROM table_a
JOIN table_b ON table_a.field1=table_b.field1
JOIN table_c ON table_a.field2=table_c.field2 AND table_b.field3=table_c.field3;

At the moment of joining the table table_c, the table table_a has been mentioned already in the FROM clause, therefore it is possible to refer to that table.

However, this is not correct:

SELECT *
FROM table_a
JOIN table_b ON table_b.field3=table_c.field3
JOIN table_c ON table_a.field2=table_c.field2

The code will cause an error because at JOIN table_b, the table table_c has not been there yet.

The Cartesian product can also be implemented using the JOIN syntax. The keywords CROSS JOIN are used for that. See the following code:

SELECT * FROM car_portal_app.a CROSS JOIN car_portal_app.b;

The preceding code is equivalent to the following:

SELECT * FROM car_portal_app.a, car_portal_app.b;

The join condition in INNER JOIN in the logic of the query has the same meaning as a condition to filter the rows in the WHERE clause. So the two following queries are in fact the same:

SELECT * FROM car_portal_app.a INNER JOIN car_portal_app.b ON a.a_int=b.b_int;
SELECT * FROM car_portal_app.a, car_portal_app.b WHERE a.a_int=b.b_int;

However, this is not the case for outer joins. There is no way to implement an outer join with the WHERE clause in PostgreSQL, though it may be possible in other databases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset