Sometimes the old ways are still the best- even in technology. Long before there was an SQL Server, database professionals made advantageous use of the “covering” index, so-called because the columns used in the index cover all the requirements of a particular query of special interest. Let’s take a look at a class covering indexes using Learning Tree’s Bigwind sample database.
In our hypothetical (and none too realistic) scenario we routinely wish to obtain a list of employees whose birthdays fall within a specified range. The SQL looks like this:
SELECT LastName, FirstName, Birthdate
WHERE Birthdate BETWEEN ‘19630101’ AND ‘19630401’
Here is the actual query plan:
Not surprisingly, we see a clustered index scan. Because there is no index on the Birthdate column, SQL Server has no alternative except to scan all the table rows looking for birthdates that match the criterion. In this particular database, scanning the employee’s table requires 298 logical reads.
Note that there is an SQL Server “missing index” warning here. We will discuss missing indexes and these warnings in the near future.
Our first response might be to create an index on the Birthdate column.
CREATE INDEX IX_Birthdate ON Employees(Birthdate)
We see our first response was reasonable and that SQL Server has indeed used the new index.
What we are seeing here is, of course, a classic query mechanism; use an index seek to find the keys for the desired rows, and then use a key lookup to obtain the row data for those keys. In this example, the classic index seek reduces the number of page reads to 119, already not too shabby.
But we can do better.
The data pages SQL Server must read for this query include many columns that the SELECT statement doesn’t require, things like Address, HomePhone, and PostalCode. We can create another index, this time with a twist.
CREATE INDEX IX_BirthdateNames ON Employees(Birthdate, LastName, FirstName)
Note that the column list contains all the columns appearing in the original SELECT statement. It is critical that the column or columns appearing in the WHERE clause be listed first in the column list for index creation. The remaining columns are just there for the data and do not play a role in the seek process. When we rerun the SELECT statement, we see that SQL Server does not do any table access at all.
Since all the columns for the query are present in the index, SQL Server need only do a seek on the index to satisfy the query requirements. The conventional index reduced the page reads from 298 to 119. This new index, a classic-style covering index, reduced the total number of page reads to 2.
In the classic covering index illustrated here, the intermediate nodes contain data values for Birthdate, Lastname, and Firstname, even though only Birthdate serves any purpose in the intermediate index nodes. We only need Lastname and Firstname for output, and for this purpose, it is sufficient that these values be in the leaf nodes of the index. In 2005, Microsoft added the INCLUDE statement to the create index options. In this example, using the INCLUDE statement would have put the Lastname and Firstname values in the leaf nodes where they are needed, and omitted them from the intermediate index nodes where they are only taking up space. We will discuss INCLUDE in the next blog, but we don’t worry about it here, because, well, it’s hard to improve a query that reads only 2 pages.
For many years, database administrators have used covering indexes to achieve what are, in many cases, stunning improvements in performance. Covering indexes are, however, a focused enhance technique, not a general one. That is to say, covering indexes are generally created specifically to improve the performance of one, or perhaps a small set, of queries. As is the case with virtually all indexes, the price for this performance comes in the form of slower writes.