Joining two or more tables of data is a powerful feature in the relational model. The SQL procedure enables you to join tables of information quickly and easily. Linking one piece of information with another piece of information is made possible when at least one column is common to each table. A maximum of 32 tables can be combined using conventional (inner) join techniques, as opposed to a maximum of two tables at a time using outer join techniques.
This chapter discusses a number of join topics including why joins are important, the differences between the various join techniques, the importance of the WHERE clause in creating joins, creating and using table aliases, joining three or more tables of data, outer (left, right, and full) joins, subqueries, and set operations. It is important to recognize that many of these techniques can be accomplished using DATA step programming techniques, but the simplicity and flexibility found in the SQL procedure makes it especially useful, if not indispensable, as a tool for the practitioner.
As relational database systems continue to grow in popularity, the need to access normalized data stored in separate tables becomes increasingly important. By relating matching values in key columns in one table with key columns in the other table(s), you can retrieve information as if the data were stored in one huge file. The results can provide new and exciting insights into possible data relationships.
Being able to define relationships between multiple tables and retrieve information based on these relationships is a powerful feature of the relational model. A join of two or more tables provides a means of gathering and manipulating data in a single SELECT statement. You join two or more tables by specifying the table names in a SELECT statement. Joins are specified on a minimum of two tables at a time, where a column from each table is used for the purpose of connecting the two tables. Connecting columns should have “like” values and the same column attributes because the join’s success is dependent on these values.
In a typical join, you name the relevant columns in the SELECT statement, you specify the tables to be joined in the FROM clause, and in the WHERE clause you specify the relationship you want revealed. That is, you describe the data subset that you want to produce. To be of use (and of a manageable size) your join needs a WHERE clause to constrain the results and ensure their utility and relevance.
Note: | When you create a join without a WHERE clause, you are creating an internal, virtual table called a Cartesian product. This table can be extremely large because it represents all possible combinations of rows and columns in the joined tables. |
The SQL procedure supports a great number of complex queries (sometimes referred to as join types). From inner joins to left, right, and full outer joins, this chapter provides a comprehensive look at the various forms of SELECT statements that can be used to perform multiple table management. Additional topics and examples include subqueries and set operations such as UNION, INTERSECT, and EXCEPT operations. The next table presents the various types of complex queries available in the SQL procedure.