Checking for Membership

"Checking for membership" actually means that you want to return a set of data that matches another set of values that are specified in the WHERE clause. To work with a subquery that returns more than one value, you can modify the comparison operator as discussed in the previous section, or you can use the IN or the EXISTS clause of the SELECT statement.

So far, all the subqueries that you have seen have been written for you. In this section, you will write a subquery from scratch. Along the way, I will give you a couple of tips to help make the process easier.

Note

The result of a subquery that is used with an IN clause is a list of zero or more values. After the subquery returns the resulting information, the outer query will use them as if you typed them in yourself.


Writing Your First Subquery with IN

To see how to use the IN clause, let's suppose you need to find all the products that are supplied by companies in a selected country. You want to create the final query in steps. This will enable you to test each piece of the query to ensure that it works as desired. The first step is to create the inner query. Remember that the inner query will not return an error message if it fails. This is the reason for testing it before including it as a subquery. This query would return all the supplier ID codes based on the supplied country name (in this case, 'Germany') in the WHERE condition as shown:

Use Northwind
select supplierid
from suppliers
where country = 'Germany'

Before continuing, you should execute this query to see whether there are any suppliers in Germany. For the Northwind database, you should get the following results:

supplierid
-----------
11
12
13

(3 row(s) affected)

You can see that there are three unique suppliers in Germany. The next step is to create the outer or main query. This would be a SELECT statement that specifies the columns you want to return from the products table as shown:

select productid, productname
from products

Caution

You might want to execute this query to test it, but if your database is very large, this could return thousands or millions of rows from the products table.


The final step is to combine these two SQL statements together by using the IN condition with the WHERE clause. The following code is the final result of this combination:

select productid, productname
from products
where supplierid in
      (select supplierid
       from suppliers
       where country = 'Germany')

With the final result:

productid   productname
----------- ----------------------------------------
25          NuNuCa Nuß-Nougat-Creme
26          Gumbär Gummibärchen
27          Schoggi Schokolade
28          Rössle Sauerkraut
29          Thüringer Rostbratwurst
30          Nord-Ost Matjeshering
64          Wimmers gute Semmelknödel
75          Rhönbräu Klosterbier
77          Original Frankfurter grüne Soße

(9 row(s) affected)

To review, the inner query is evaluated first, producing the ID numbers of the three suppliers who meet the subqueries'WHERE condition. The outer query is then evaluated using these values in the IN clause. If I knew the ID numbers of the suppliers in Germany, I could have written the query as follows:

select productid, productname
from products
where supplierid in (11,12,13)

Using the NOT IN Condition

There is one variation of the IN condition in which you would return rows in the outer query where the comparison column does not match any of the values returned from the subquery. When using NOT IN, the subquery still returns a list of zero or more values. However, the final result will be the opposite of what you would expect. Using the previous example, the NOT IN would return all the products that are not supplied from Germany.

Creating Correlated Subqueries

Most queries are evaluated by executing the subquery once and then substituting the resulting value or values into the WHERE clause of the outer query. In queries that include a correlated subquery or repeating subquery, the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query. The following example finds the names of all the authors in the PUBS database who earn 100% of the shared royalty on a book:

USE pubs
SELECT au_lname, au_fname
FROM authors
WHERE 100 IN
   (SELECT royaltyper
   FROM titleauthor
   WHERE titleauthor.au_ID = authors.au_id)

The results of this query are

au_lname                                 au_fname
---------------------------------------- --------------------
White                                    Johnson
Green                                    Marjorie
Carson                                   Cheryl
Straight                                 Dean
Locksley                                 Charlene
Blotchet-Halls                           Reginald
del Castillo                             Innes
Panteley                                 Sylvia
Ringer                                   Albert

(9 row(s) affected)

Unlike most of the subqueries shown earlier today, the subquery in this example cannot be resolved independently of the outer query. It requires a value for authors.au_id, but this value is a variable. It changes as each row of the authors table is examined.

That is exactly how this query is evaluated: SQL Server considers each row of the authors table for inclusion in the results by substituting the value in each row into the subquery's WHERE condition. Then, if the value in the royaltyper column matches 100, it is returned in the final result set. This evaluation occurs for each row in the authors table.

Things to Remember When Using IN

Using the IN condition allows you to search the data in the table for more than one value. However, when you are using the IN condition, there are a few things to watch out for. These are discussed in the following sections.

Removing Duplicate Values from the IN List

Do you really want duplicate values in the IN list? The answer depends on the size of the tables being accessed in the outer query and subqueries. There is one good reason for not wanting duplicates in the list. Suppose that the outer query is going against a very large table and the subquery is working on a small table, you should use the DISTINCT keyword to remove any duplicates in the subquery result set. This helps the server improve response time because each row in the outer query must be tested against the values in the subquery list. If you have fewer values returned, the server has fewer items in the list to compare against.

However, for every other instance, there is no real need to remove duplicates from the list. In fact, using the DISTINCT keyword when you don't really need it will slow down the response from the server. To summarize: There are only a few situations in which using DISTINCT in the subquery is warranted, but in general, it doesn't pay to use it.

Matching Data Types

You want to make sure that the data type of the list items being returned matches the data type of the column that they are being compared with. If you don't, the server must perform a conversion on the entire list of items for every row in the tables referenced in the outer query. In addition, if the data types are incompatible (for example, money and character), you will receive an error message from the server.

Returning an Empty Result Set

When you use a subquery that returns no rows, you will not receive an error message. In the case of an IN list, you are comparing values to an empty list, and no rows in the outer query will be in that list.

Writing Subqueries with EXIST

I want to discuss one other type of subquery today. The WHERE EXISTS test enables you to verify the existence of a row in a table and to take action only if a matching row exists. The EXISTS condition is really a type of correlated subquery. Suppose that you want to know which business titles have sold any books. To verify this, you need to look in the titles table for all titles that have a type of 'Business'. Then, you look in the sales table to find the rows for the titles that you previously found. You don't really want to select anything from the sales table; you just want to find those books that have at least one row in the sales table. The following example shows how to use the EXISTS condition to perform this task:

USE pubs
Select title_id
from titles
where exists (
      select *
      from sales
      where sales.title_id = titles.title_id)

The subquery in an EXISTS test works the same way as the correlated subquery: by referencing a table in the outer query. For every row in titles that matches the WHERE clause, the subquery is evaluated. If the subquery is true, the row is included in the result set.

Comparing Joins and Subqueries

Up to this point in this lesson, you have learned how to select information from one or more related tables by making use of subqueries. You can also use a subquery to resolve problems when information from another table is needed to complete a SELECT statement's WHERE condition. However, you must use a join to display information from more than one table.

Many T-SQL statements that include subqueries can be alternatively created using joins, with the reverse being true as well. Other questions can be posed only with subqueries. In T-SQL, there is usually no performance difference between a statement that uses subqueries and a statement that uses joins instead. However, in some cases where existence must be checked, a join yields better performance. The following two code examples return the same result set, but you can see that one uses joins whereas the other uses a subquery. This first example uses the INNER JOIN syntax:

USE pubs
Select pub_name
from publishers as p inner join
     titles as t
     on t.pub_id = p.pub_id
where t.type = 'business'

This next example retrieves the same data by using a subquery:

Select pub_name
from publishers as p
Where p.pub_id in (
      select t.pub_id
      from titles as t
      where t.type = 'business')

When you execute these two SQL statements, you will see a different number of rows returned. This will probably confuse you just a little, until you take a closer look. You will see that the subquery version returns only unique values. To reproduce this exactly in the join version, you would need to include the DISTINCT keyword.

Note

Going back to the topic of performance, both SQL statements using joins and those using subqueries can return the same data, but when you need to use the DISTINCT keyword to eliminate duplicates, you are actually slowing down the processing. So, if you don't really care about duplicates, the join SQL statements are faster.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset