Many of the queries presented in this cookbook go beyond what is possible using tables as they are typically available in a database, especially in relation to aggregate functions and window functions. Therefore, for some queries, you need to make a derived table – either a subquery or a Common Table Expression (CTE).
Arguably the simplest way to create a virtual table that allows you to run queries on window functions or aggregate functions is a subquery. All that’s required here is to write out the query that you need to query within parentheses, and then to write a second query that uses it. The table below illustrates the use of subqueries with a simple double aggregate - you want to find not just the counts of employees in each job, but then identify the highest number, but you can’t nest aggregate functions directly in a standard query.
One pitfall is that some vendors require you to give the subquery table and alias, but others do not. The example below was written in MySQL which does require an alias. The alias here is head_count_tab after the closing parenthesis. Others that require an alias are PostgreSQL and SQL Server, while Oracle does not.
select
max
(
HeadCount
)
as
HighestJobHeadCount
from
(
select
job
,
count
(
empno
)
as
HeadCount
from
emp
group
by
job
)
head_count_tab
Common Table Expressions were intended to overcome some of the limits of subqueries, and may be most well known for allowing recursive queries to be used within SQL. In fact, enablng recursion within SQL was the main inspiration for Common Table Expressions. The example below achieves the same result as the subquery we saw earlier - it finds a double aggregate.
with
head_count_tab
(
job
,
HeadCount
)
as
(
select
job
,
count
(
empno
)
from
emp
group
by
job
)
select
max
(
HeadCount
)
as
HighestJobHeadCount
from
head_count_tab
Although this query solves a simple problem it illustrates the essential features of a CTE. We introduce the derived table using the WITH clause, specifying the column headings in the parentheses, and use parentheses around the derived table’s query itself. If we want to add more derived tables, we can add more as long as we separate each one a comma, and provide its name before its query (the reverse of how aliasing usually works in SQL).
Because the inner queries are presented before the outer query, in many circumstances they may also be considered more readable – they make it easier to study each logical element of the query separately in order to understand the logical flow. Of course, as with all things in coding, this will very according to circumstances, and sometimes the subquery will be more readable.
Considering that recursion is the key reason for Common Table Expressions to exist, the best way to demonstrate their capability is through a recursive query.
The query that follows calculates the first 20 numbers in the Fibonacci sequence using a recursive CTE. Note that in the first part of the anchor query we can initialise the values in the first row of the virtual table.
with
recursive
workingTable
(
fibNum
,
NextNumber
,
index1
)
as
(
select
0
,
1
,
1
union
all
select
fibNum
+
nextNumber
,
fibNUm
,
index1
+
1
from
anchor
where
index1
<
20
)
select
fibNum
from
workingTable
as
fib
The Fibonacci sequence finds the next number by adding the current and the previous number; you could also use LAG to achieve this result. However, in this case we’ve made a pseudo-LAG by using two columns to account for the current number and the previous. Note the keyword RECURSIVE, which is mandatory in MySQL, Oracle and PostgreSQL but not in SQL Server or DB2. In this query, the index1 column is largely redundant in the sense of not being used for the Fibonacci calculation. Instead, we have included it to make it simpler to set the number of rows returned via the WHERE clause. In a recursive CTE, the WHERE clause becomes crucial, as without it the query would not terminate (although in this specific case, if you try deleting it, you are likely to find that your database throws an overflow error when the numbers become too large for the data type).
At the simple end of the spectrum, there’s not a lot of difference between a subquery and CTE in terms of useability. Both allow for nesting or writing more complicated queries that refer to other derived tables. However, once you start nesting many subqueries, it readability becomes tricky as you need to pull back the layers of query like an onion within nested brackets, whereas the CTE writes the queries down the page, making understanding which new derived table refers to which earlier derived much simpler.
The use of derived tables dramatically extends the range of SQL. Both subqueries and CTES are used many times throughout the book, so it is important to understand how they work, especially as they each have a particular syntax that you need to master to ensure success. The recursive CTE, which is now available in the vendor offerings in this book is one of the biggest extensions to have occurred within SQL, allowing for many extra possibilities.