Using Covering Indexes to Avoid Hashes and Sorts

In the previous installment, we saw that covering indexes can reduce the number of page reads required to satisfy specific queries, thereby improving performance for those queries. We’ll turn our attention now to some more subtle applications of covering indexes. For this we will use the ContosoRetailDW sample database, which is considerably larger than the Adventureworks sample and which provides opportunities for some fairly realistic analytical queries.

We’ll start with a straightforward look at total sales, grouped by date.

SELECT DateKey, SUM(SalesAmount) AS Total

FROM FactSales

GROUP BY DateKey

Running this query reveals the following query characteristics:

SQL Server Execution Times:

CPU time = 1469 ms, elapsed time = 547 ms.

Logical Read of FactSales table: 19212

We know from our previous experience that, if we are will to spend the extra hard drive space, we can improve this query with a covering index.

CREATE INDEX IX_DateKey_SalesAmount_ProductKey ON FactSales(DateKey) INCLUDE (SalesAmount, Productkey)

Note that the covering index includes the ProductKey column, which is not necessary for this query, but which will help us illustrate a point very shortly.

SQL Server Execution Times:

CPU time = 593 ms, elapsed time = 322 ms.

Logical Read of FactSales table: 12829

As expected, the covering index provides for fewer page reads and faster execution time. For this query, SQL Server chooses a parallel plan:

covering indexes

A parallel plan is not at all surprising; it’s easy to tally the totals of different groups on different threads at the same time and then reassemble the results into a single rowset.

Now let’s run a different query.

SELECT ProductKey, SUM(SalesAmount) AS Total

FROM FactSales

GROUP BY ProductKey

If we look at the query plan, we see that, because we included ProductKey, this query has made use of the covering index created for the previous query.

covering indexes

SQL Server Execution Times:

CPU time = 1140 ms, elapsed time = 582 ms.

The query optimizer knew that it could take advantage of the covering index to achieve fewer page reads, which, of course, is the primary goal of the covering index. However, we also note that in this case the query plan includes a hash match operator. The cost of this operator is a bit disturbing, 42% of the total estimated cost.

Using Covering Indexes to Control Sort Order

The reason the hash match operator becomes clear if we consider what SQL Server must do when processing this query, particularly if a parallel plan is to be used. The rows read from the clustered index must be split across several threads (in this case three). Because we are summing by groups, for any individual ProductID all the rows for that productID should be processed on the same thread, and the query engine divvies them up using a hashing algorithm.

We can do better.

If SQL Server can read rows sorted by the column used to partition the rows across threads (i.e. the GROUP BY column), then no hashing should be required. We’ll create a second covering index.

CREATE INDEX IX_ProductKey_SalesAmount_DateKey ON FactSales(Productkey) INCLUDE (SalesAmount, DateKey)

DateKey is not necessary for this query, just as ProductKey was not necessary for the previous one. This was done solely for experimental purposes. By keeping the sizes of the two different covering indexes very similar, we can clearly observe the effects of the index ordering. Note that, in contrast with the first index, this one sorts in order of ProductKey. SalesAmount and DateKey just come along for the ride.

We see that the hash match operator has disappeared from the query plan, and there is a modest improvement in performance.

covering indexes

SQL Server Execution Times:

CPU time = 1063 ms, elapsed time = 476 ms.

In fact, except for the extra column put in for testing purposes, the query is now almost fully optimized, since 91% of the query cost is associated with the simple reading of the row data, and our query is not reading any rows that do not contribute directly to the result.

Conclusion

By far, the most important quality of the covering index is its ability to dramatically reduce the number of page reads required by a particular query. However, if we remain alert to the presence of potentially costly operators, we can identify opportunities where a covering index provides greater benefit by reducing or eliminating the need for sorting and hashing operations.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.