Television’s Mythbusters seem to have a lot of fun; it’s interesting that so many myths seem to involve blowing stuff up on camera. SQL myth-busting may not be as visually dramatic, but for people who are interested in query performance, it is much more important.
There are a lot of people who think that it is always better for SQL Server to cache query plans in memory and that it is always better to use an index to seek the rows you want from a table. Today we’ll take a look at the reality of index use in queries, and the implications that have for query plan caching.
Once again we will start with Learning Tree’s Bigwind database, awful by design. We’ll run a very simple query to obtain all the customers from a particular region.
SELECT * FROM Customers
WHERE Region = ‘Delaware’
If we examine the procedure cache using sys.syscacheobejcts, we see two entries for this query.
The first entry is the ad hoc plan as sent from the client, and the second is an auto-parameterized plan prepared by SQL Server so that the plan can be reused just by plugging in new values for the variable “@1”.
We observe that SQL Server did a clustered index scan to get the results, so perhaps we might be able to speed this query up by creating an index. Since we want to make measured comparisons, we will note that the estimated cost for this plan is 0.706 and the query requires 936 logical page reads. Now we create an index:
CREATE INDEX IX_Region ON Customers(Region)
Running the query again, we see a new query plan with a lower cost of 0.126; the logical reads are down to 130. However, if we examine the procedure cache, there is only the ad hoc plan. The auto-parameterized plan is now gone!
It is worth noting that in this query plan, 97% of the total cost was spent looking up index keys for Delaware customers in the clustered index. Maybe scanning the clustered index might not be so bad after all.
What happened? SQL Server will not auto-parameterize a query plan unless the optimal plan is the same for all possible values of the parameter. Now that there is an index, there are some regions, like Delaware, for which the optimal plan uses the index, and there are other regions, like California, for which using the index would be a poor choice.
Examining the query cache reveals ad hoc plans, but as we would expect no auto-parameterized plan.
If anyone is unconvinced that SQL Server made the right decision to avoid the use of the index when searching for California rows, we can force it to use the index with a hint.
SELECT * FROM Customers WITH(INDEX(IX_Region))
WHERE Region = ‘California’
The logical reads required for this query has skyrocketed to 4188! Clearly, for California, scanning the clustered index makes more sense than using the index! This raises an interesting and important question: when does an index stop being useful?
To determine whether an index is useful or not for a particular query, SQL Server must have advance knowledge of about how many rows will be selected from the table by a query. This is why the query optimizer must be able to consult statistics on the distribution of data in an index. SQL Server creates and maintains statistics on all indexes. We can view this statistical information using DBCC SHOW_STATISTICS.
In this particular instance, rather than view the DBCC report directly, we can paste the output into Excel where we can sort the regions in order of the number of occurrences in each region in the table.
I added two columns to the Excel table. One to indicate whether SQL Server used an index for that region, and another with a formula to show the fraction that region takes up of the total number of table rows. In this case, we see that Maryland, with 208, had the most rows of any region for which an index was used, and Tennessee, at 247, had the least number of rows for any region for which an index was not used.
For this index, the cutoff between using an index and not using an index was somewhere between 1.6 and 1.9% of the number of rows in the table. These numbers will vary for other indexes and other tables since they depend on more than just number of rows. The depth of the clustered index depends on things like the width of the index keys and does not scale linearly as the table grows in size. Nevertheless, the lesson is clear; once you are requesting more than a relatively small percentage of rows in a table, an index will probably not be useful.
One of the most import decisions that the SQL Server query optimizer makes on a routine basis is whether or not to use an existing table index when generating a query plan. In many cases, the cost of key lookups far exceeds any potential benefit of the index. It is index statistics that allow the optimizer to calculate with fair precision whether a query will benefit from index use.
Learning Tree offers a course to help you learn more ways to optimize your SQL Server queries.