"Checking for membership" actually means that you want to return a set of data that matches another set of values that are specified in the WHERE clause. To work with a subquery that returns more than one value, you can modify the comparison operator as discussed in the previous section, or you can use the IN or the EXISTS clause of the SELECT statement.
So far, all the subqueries that you have seen have been written for you. In this section, you will write a subquery from scratch. Along the way, I will give you a couple of tips to help make the process easier.
Note
The result of a subquery that is used with an IN clause is a list of zero or more values. After the subquery returns the resulting information, the outer query will use them as if you typed them in yourself.
To see how to use the IN clause, let's suppose you need to find all the products that are supplied by companies in a selected country. You want to create the final query in steps. This will enable you to test each piece of the query to ensure that it works as desired. The first step is to create the inner query. Remember that the inner query will not return an error message if it fails. This is the reason for testing it before including it as a subquery. This query would return all the supplier ID codes based on the supplied country name (in this case, 'Germany') in the WHERE condition as shown:
Use Northwind select supplierid from suppliers where country = 'Germany'
Before continuing, you should execute this query to see whether there are any suppliers in Germany. For the Northwind database, you should get the following results:
supplierid ----------- 11 12 13 (3 row(s) affected)
You can see that there are three unique suppliers in Germany. The next step is to create the outer or main query. This would be a SELECT statement that specifies the columns you want to return from the products table as shown:
select productid, productname from products
Caution
You might want to execute this query to test it, but if your database is very large, this could return thousands or millions of rows from the products table.
The final step is to combine these two SQL statements together by using the IN condition with the WHERE clause. The following code is the final result of this combination:
select productid, productname from products where supplierid in (select supplierid from suppliers where country = 'Germany')
With the final result:
productid productname ----------- ---------------------------------------- 25 NuNuCa Nuß-Nougat-Creme 26 Gumbär Gummibärchen 27 Schoggi Schokolade 28 Rössle Sauerkraut 29 Thüringer Rostbratwurst 30 Nord-Ost Matjeshering 64 Wimmers gute Semmelknödel 75 Rhönbräu Klosterbier 77 Original Frankfurter grüne Soße (9 row(s) affected)
To review, the inner query is evaluated first, producing the ID numbers of the three suppliers who meet the subqueries'WHERE condition. The outer query is then evaluated using these values in the IN clause. If I knew the ID numbers of the suppliers in Germany, I could have written the query as follows:
select productid, productname from products where supplierid in (11,12,13)
There is one variation of the IN condition in which you would return rows in the outer query where the comparison column does not match any of the values returned from the subquery. When using NOT IN, the subquery still returns a list of zero or more values. However, the final result will be the opposite of what you would expect. Using the previous example, the NOT IN would return all the products that are not supplied from Germany.
Most queries are evaluated by executing the subquery once and then substituting the resulting value or values into the WHERE clause of the outer query. In queries that include a correlated subquery or repeating subquery, the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query. The following example finds the names of all the authors in the PUBS database who earn 100% of the shared royalty on a book:
USE pubs SELECT au_lname, au_fname FROM authors WHERE 100 IN (SELECT royaltyper FROM titleauthor WHERE titleauthor.au_ID = authors.au_id)
The results of this query are
au_lname au_fname ---------------------------------------- -------------------- White Johnson Green Marjorie Carson Cheryl Straight Dean Locksley Charlene Blotchet-Halls Reginald del Castillo Innes Panteley Sylvia Ringer Albert (9 row(s) affected)
Unlike most of the subqueries shown earlier today, the subquery in this example cannot be resolved independently of the outer query. It requires a value for authors.au_id, but this value is a variable. It changes as each row of the authors table is examined.
That is exactly how this query is evaluated: SQL Server considers each row of the authors table for inclusion in the results by substituting the value in each row into the subquery's WHERE condition. Then, if the value in the royaltyper column matches 100, it is returned in the final result set. This evaluation occurs for each row in the authors table.
Using the IN condition allows you to search the data in the table for more than one value. However, when you are using the IN condition, there are a few things to watch out for. These are discussed in the following sections.
Do you really want duplicate values in the IN list? The answer depends on the size of the tables being accessed in the outer query and subqueries. There is one good reason for not wanting duplicates in the list. Suppose that the outer query is going against a very large table and the subquery is working on a small table, you should use the DISTINCT keyword to remove any duplicates in the subquery result set. This helps the server improve response time because each row in the outer query must be tested against the values in the subquery list. If you have fewer values returned, the server has fewer items in the list to compare against.
However, for every other instance, there is no real need to remove duplicates from the list. In fact, using the DISTINCT keyword when you don't really need it will slow down the response from the server. To summarize: There are only a few situations in which using DISTINCT in the subquery is warranted, but in general, it doesn't pay to use it.
You want to make sure that the data type of the list items being returned matches the data type of the column that they are being compared with. If you don't, the server must perform a conversion on the entire list of items for every row in the tables referenced in the outer query. In addition, if the data types are incompatible (for example, money and character), you will receive an error message from the server.
When you use a subquery that returns no rows, you will not receive an error message. In the case of an IN list, you are comparing values to an empty list, and no rows in the outer query will be in that list.
I want to discuss one other type of subquery today. The WHERE EXISTS test enables you to verify the existence of a row in a table and to take action only if a matching row exists. The EXISTS condition is really a type of correlated subquery. Suppose that you want to know which business titles have sold any books. To verify this, you need to look in the titles table for all titles that have a type of 'Business'. Then, you look in the sales table to find the rows for the titles that you previously found. You don't really want to select anything from the sales table; you just want to find those books that have at least one row in the sales table. The following example shows how to use the EXISTS condition to perform this task:
USE pubs Select title_id from titles where exists ( select * from sales where sales.title_id = titles.title_id)
The subquery in an EXISTS test works the same way as the correlated subquery: by referencing a table in the outer query. For every row in titles that matches the WHERE clause, the subquery is evaluated. If the subquery is true, the row is included in the result set.
Up to this point in this lesson, you have learned how to select information from one or more related tables by making use of subqueries. You can also use a subquery to resolve problems when information from another table is needed to complete a SELECT statement's WHERE condition. However, you must use a join to display information from more than one table.
Many T-SQL statements that include subqueries can be alternatively created using joins, with the reverse being true as well. Other questions can be posed only with subqueries. In T-SQL, there is usually no performance difference between a statement that uses subqueries and a statement that uses joins instead. However, in some cases where existence must be checked, a join yields better performance. The following two code examples return the same result set, but you can see that one uses joins whereas the other uses a subquery. This first example uses the INNER JOIN syntax:
USE pubs Select pub_name from publishers as p inner join titles as t on t.pub_id = p.pub_id where t.type = 'business'
This next example retrieves the same data by using a subquery:
Select pub_name from publishers as p Where p.pub_id in ( select t.pub_id from titles as t where t.type = 'business')
When you execute these two SQL statements, you will see a different number of rows returned. This will probably confuse you just a little, until you take a closer look. You will see that the subquery version returns only unique values. To reproduce this exactly in the join version, you would need to include the DISTINCT keyword.
Note
Going back to the topic of performance, both SQL statements using joins and those using subqueries can return the same data, but when you need to use the DISTINCT keyword to eliminate duplicates, you are actually slowing down the processing. So, if you don't really care about duplicates, the join SQL statements are faster.