Chapter 41. Speeding up your queries with index covering

Alex Kuznetsov

When a nonclustered index contains all the columns from one table needed to satisfy a query, that index is called covering with respect to that query, and that query is covered by that index. In many cases, creating a covering index is a quick and easy way to boost the performance of a query. But covering indexes aren’t free—they use additional storage, slow down most updates, and may cause concurrency issues such as lock contention and deadlocks. This means that speeding up some selects may and usually will cause some modifications to run slower. Usually, you don’t need to optimize standalone queries; in most cases, you should aim to improve the system’s overall productivity. To accomplish that, you need to find some balance to provide a system with selects that run faster and updates that still perform reasonably well. You’ll see several examples demonstrating these concepts in this chapter. The first example will demonstrate how a covering index speeds up selects.

Index covering speeds up selects

First, we need to discuss the difference between clustered and nonclustered indexes. Clustered indexes store key columns on their root and intermediate pages, and all columns on their leaf pages. In fact, the leaf level of a clustered index is the table’s data. Nonclustered indexes also store key columns on their root and intermediate pages, but on their leaf-level pages they store all the index key column values, plus bookmarks that allow the database engine to find the corresponding rows in the table.

Also, nonclustered indexes may store other columns listed in the INCLUDE clause, which will be discussed later. For example, the table [AdventureWorks].[Production].[WorkOrder] has a nonclustered index on column ScrapReasonID. (I’m using the SQL Server 2005 version of the AdventureWorks database.) This index stores ScrapReasonID on all its levels as well as WorkOrderID on its leaf level. For this index, WorkOrderID column is a bookmark—it uniquely identifies rows in the table. Consider the following query:

SELECT [DueDate],SUM([OrderQty]) AS SumQty
FROM [AdventureWorks].[Production].[WorkOrder]
GROUP BY [DueDate]

The optimizer can choose to scan the whole clustering index (the whole table [AdventureWorks].[Production].[WorkOrder]) to satisfy this query, unless there’s an index that would allow the query to run faster. You can create a covering index as follows:

CREATE NONCLUSTERED INDEX [IX_WorkOrder_DueDate] ON [Production].[WorkOrder]
(
[DueDate], [OrderQty]
)

The index IX_WorkOrder_DueDate is several times smaller than the table [AdventureWorks].[Production].[WorkOrder]. Clearly, scanning this index will require fewer reads. The optimizer will choose to scan the index rather than a significantly bigger table that the index is created on.


Note

Starting with SQL Server 2005, you can also use an INCLUDE clause for further optimization, as follows:

CREATE NONCLUSTERED INDEX [IX_WorkOrder_DueDate_with_INCLUDE] ON
[Production].[WorkOrder]
(
[DueDate]
)INCLUDE([OrderQty])

All the tables in the AdventureWorks database are smallish, so you won’t notice much difference between IX_WorkOrder_DueDate and IX_WorkOrder_DueDate_with_INCLUDE. For really big tables, the index IX_WorkOrder_DueDate_with_INCLUDE may give noticeably better performance. The INCLUDE clause will be discussed in more detail at the end of this chapter.


Note

For such comparisons, it’s convenient to use two copies of the same database—the original database exactly as it exists in production and the modified one with a covering index.


Back up your AdventureWorks database, create a new database named AdventureWorks_Copy, and restore the backup of AdventureWorks against AdventureWorks_Copy. You can either do it all via SSMS or you can run the script in listing 1, changing the folder from C:Temp if necessary.

Listing 1. Restore the AdventureWorks database to AdventureWorks_Copy
USE master
GO
BACKUP DATABASE AdventureWorks TO
DISK=N'C:TempAdventureWorksFullBackup.bak'
GO
RESTORE DATABASE AdventureWorks_Copy FROM
DISK=N'C:TempAdventureWorksFullBackup.bak'
WITH REPLACE, MOVE 'AdventureWorks_Data' TO
'C:TempAdventureWorks_Copy_Data.mdf',
MOVE 'AdventureWorks_Log' TO 'C:TempAdventureWorks_Copy_Log.mdf'

Highlight the following query in Management Studio:

SELECT [DueDate],SUM([OrderQty]) AS SumQty
FROM [AdventureWorks].[Production].[WorkOrder]
GROUP BY [DueDate]

SELECT [DueDate],SUM([OrderQty]) AS SumQty
FROM [AdventureWorks_Copy].[Production].[WorkOrder]
GROUP BY [DueDate]

Press Ctrl-L to see the execution plans. Note that the optimizer expects the covered query to run several times faster, because the covered query is expected to perform several times fewer reads. Is the optimizer correct in its estimate? Compare real execution costs and see for yourself. Run the following commands:

SET STATISTICS IO ON
SET STATISTICS TIME ON

Run both selects and see the difference in reads and CPU cycles—with a covering index, the database engine performs several times fewer reads and uses less CPU. Note that, to satisfy a query, the database engine not only reads data, it also acquires locks to implement the current isolation level, unless you run the query under READ UNCOMMITTED isolation level.


Note

Understanding isolation levels is essential. Because a good description of isolation levels would be substantially bigger than this chapter, it’s beyond the scope of this discussion. I encourage you to read about isolation levels in Books Online or on the MSDN web site: http://msdn.microsoft.com/en-us/library/ms189122.aspx.


Acquiring the locks necessary to implement an isolation level uses up CPU cycles, more locks, and more CPU for higher isolation levels. Usually the covered query needs to acquire fewer locks, because it needs to read fewer pages. You can increase the isolation level to SERIALIZABLE, rerun both selects, and see for yourself that for this query, the advantages of the covering index are more pronounced for higher isolation levels.

Let’s discuss range queries and compare a covering index versus a noncovering one. Let’s create a noncovering index, as shown in listing 2.

Listing 2. Creating a noncovering index
USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX [IX_WorkOrder_DueDate] ON
[Production].[WorkOrder]
(
[DueDate]
)

Let’s run two range queries as in listing 3, and then see the execution plans.

Listing 3. Running two range queries
SELECT [DueDate],[OrderQty]
FROM [AdventureWorks].[Production].[WorkOrder]
WHERE [DueDate] = '20040630'

SELECT [DueDate],[OrderQty]
FROM [AdventureWorks_Copy].[Production].[WorkOrder]
WHERE [DueDate] = '20040630'

To satisfy the first query, the optimizer uses index seeks and bookmark lookups (bookmark lookups are also known as clustered key lookups). The second query uses index seeks only—this is a much simpler and cheaper plan. This is why the optimizer expects the cost of the first query to be approximately 99 times more than the cost of the second one, as shown in figure 1.

Figure 1. Index seek plan

Don’t expect to get exactly 99 percent when you run the queries on your server—your mileage may vary, depending on your version of SQL Server. For example, I’m sometimes getting different benchmark results running the same query against the same data on SQL Server 2000 and SQL Server 2005. Yet you should get similar numbers—the second query is dramatically more efficient. The optimizer thinks so, and the real execution costs confirm that the optimizer is right.

If the range is wide enough, the noncovering index is no longer used for the range query—the whole clustering index (the whole table) is scanned instead. But only a portion of a covering index is scanned even if the range is wide. You can run the two queries in listing 4 and see for yourself.

Listing 4. Comparing covering and noncovering index performance
SELECT [DueDate],[OrderQty]
FROM [AdventureWorks].[Production].[WorkOrder]
WHERE [DueDate] BETWEEN '20040630' AND '20040830'

SELECT [DueDate],[OrderQty]
FROM [AdventureWorks_Copy].[Production].[WorkOrder]
WHERE [DueDate] BETWEEN '20040630' AND '20040830'

As you have seen, a covered query runs faster, because

  • It may be satisfied by a simpler plan (this is true for many but not all queries).
  • It needs to perform fewer reads.
  • It may need to acquire less locks (this is true for many but not all queries).

Some rules of thumb about indexes aren’t true for covering indexes

In general, blanket statements are rarely true in database programming. In particular, some rules of thumb are true for noncovering indexes, but not true for covering indexes. For instance, the rule “an index can only be used if the condition in the WHERE clause is index friendly” doesn’t apply to covering indexes—although the following condition isn’t index friendly:

SELECT DISTINCT [LastName],[FirstName]
FROM [Person].[Contact]
WHERE [LastName] LIKE '%quist'

A covering index on (LastName, FirstName) can still be used to satisfy this query, although it’ll require an index scan, not an index seek, and as such will take longer and more page reads than a query that can seek. You don’t even have to use the leftmost column of your covering index in your WHERE clause—a covering index on (FirstName, LastName) can also be used to satisfy this select just as well.

The rule that you need to have the most selective column first in your index isn’t true for covering indexes, either. For instance, even though ContactID is more selective than FirstName, the following query runs faster against an index on (FirstName, ContactID) than against an index on (ContactID, FirstName):

SELECT DISTINCT [ContactID]
FROM [Person].[Contact]
WHERE [FirstName] = N'Jane'

Covering indexes usually slow down modifications

Covering indexes usually slow down modifications. This is a no-brainer. All indexes usually slow down modifications, including covering indexes. The one word I feel that I need to explain is usually—I added it to the title of this section automatically, because blanket statements are rarely true in database programming. In the example in listing 5, a covering index speeds up a modification.

Listing 5. Update statement performance improved with a covering index
BEGIN TRANSACTION

UPDATE [AdventureWorks].[Production].[WorkOrder]
SET [OrderQty] = [OrderQty]+1
WHERE [DueDate] = '20040630' AND [OrderQty] = 12345678

UPDATE [AdventureWorks_Copy].[Production].[WorkOrder]
SET [OrderQty] = [OrderQty]+1
WHERE [DueDate] = '20040630' AND [OrderQty] = 12345678

ROLLBACK

As you’ve seen, in some rare cases a covering index may speed up an update, if it makes it easier to locate the rows being modified.

One index should cover many queries

Suppose that you have a table with a clustered index and two nonclustered indexes. Inserting one row into such a table means that at least three pages need to be modified or added (maybe more than three if there’s a page split). The more nonclustered indexes you add on a table, the more pages a single-row insert must modify or insert. Similar reasoning is true for multi-row modifications, including updates and deletes—their real execution costs increase as the number of indexes grows. This is why it’s important to keep the number of indexes low for tables that are frequently modified.

For example, consider two separate indexes on the [Person].[Contact] table, each covering a different SELECT statement as described in listing 6.

Listing 6. Creating two covering indexes
CREATE NONCLUSTERED INDEX [IX_Contact_Covering1] ON [Person].[Contact]
(
[LastName], [FirstName], [EmailAddress]
)
GO
-- covers the following query:
SELECT [LastName], [FirstName], [EmailAddress] FROM [Person].[Contact]
WHERE [LastName] LIKE 'Lars%'
GO

CREATE NONCLUSTERED INDEX [IX_Contact_Covering2] ON [Person].[Contact]
(
[LastName], [FirstName], [Phone]
)
GO
-- covers the following query:
SELECT [LastName], [FirstName], [EmailAddress] FROM [Person].[Contact]
WHERE [LastName] LIKE 'Lars%'
GO

You might instead consider creating only one index, as follows:

CREATE NONCLUSTERED INDEX [IX_Contact_Covering] ON [Person].[Contact]
(
[LastName], [FirstName], [Phone], [EmailAddress]
)

This index will cover both selects and will slow down modifications less severely than the two previous indexes would.

One index can both cover queries and implement uniqueness

If you create a primary key or unique constraint, SQL Server automatically creates a unique index to implement the constraint. Note, though, that you don’t have to create a constraint to enforce uniqueness. You can manually create a unique index, and this index is just as good as a constraint for most practical purposes—you can even have a foreign key constraint refer to a unique index instead of a primary key or unique constraint. Why is this relevant? Because you can have one index to both enforce uniqueness of a smaller set of columns and to cover a query involving a bigger set of columns.

For example, suppose that the EmailAddress column in Person.Contact isn’t nullable (which at the time of this writing isn’t true in AdventureWorks), and that you want to enforce the uniqueness of EmailAddress. Suppose that you also need an index to cover the following query:

SELECT [LastName], [FirstName], [EmailAddress]
FROM [Person].[Contact] WHERE [EmailAddress] LIKE 'lars%'

One index can be used to accomplish both goals, as follows:

CREATE UNIQUE NONCLUSTERED INDEX [IX_Contact_Unique_Covering] ON
[Person].[Contact]
(
[EmailAddress]
)
INCLUDE([LastName], [FirstName])

Note that this index is different from a unique index on all three columns (EmailAddress, LastName, FirstName). A unique index on all three columns would allow two persons with different names to have the same EmailAddress.

Keeping the number of indexes small is essential in most cases—it allows for quick selects without slowing modifications too much. Having one index serve many purposes allows you to keep the number of indexes small.

Summary

As you’ve seen, index covering is a simple and powerful way to speed up selects, because it allows the database engine to satisfy queries with fewer reads and locks. Although covering indexes may slow down modifications, in most cases this impact isn’t prohibitively high. Usually you can manage to utilize index covering to keep your selects fast and your modifications reasonably performant too.

About the author

Alex Kuznetsov has been working with databases for more than a decade. He leads a team of database developers and optimizes database performance for a proprietary trading firm. Also he enjoys blogging on http://sqlblog.com and is currently writing a book on defensive database programming.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset