Optimize Your SQL Server Queries with Column Statistics

Have you ever done something that your mother told you to never ever do? Well, we’re going to do that today, except with SQL Server. We’re going to get rid of SQL Server column statistics with the goal of learning what column statistics are and how they help SQL Server arrive at superior query plans.

What Are Column Statistics?

Column statistics are nothing more than information about with frequency with which values appear in a particular column (or columns) in a table. Unlike index statistics, which are created when a column is indexed, column statistics are independent of indexes and, by default, are created automatically any time a column appears in a WHERE clause or is used in a JOIN condition.

Automatically created column statistics are easy to identify since they always begin with the prefix _WA_SYS. You can observe the column statistics in a particular database by querying the sys.sysindexes view.

SELECT OBJECT_NAME(id), name, * FROM sys.sysIndexes WHERE

name LIKE ‘_WA_Sys%’ and id > 100

The “id > 100” in the WHERE clause restricts our results to user tables; we want to ignore system tables in our results.

Here are the results for the “Bigwind” sample database.which Learning Tree created for their performance tuning class; it has the same simple structure as the familiar Northwind sample, but has lots more rows as well as lots more poor design choices. (It’s supposed to be a poor performing database, after all.)

We can drop statistics with the same syntax used to drop indexes.

DROP STATISTICS Customers._WA_Sys_00000009_03317E3D

If we have a list of statistics we would like to drop, we can let SQL write the DROP statements for us.

SELECT ‘DROP STATISTICS [‘ + OBJECT_NAME(id) + ‘].’ + name FROM sys.sysIndexes WHERE

name LIKE ‘_WA_Sys%’ and id > 100

If we drop existing column statistics and turn off the automatic creation of statistics, we can observe what the query optimizer does when it has no statistics to help. Turning off the automatic creation of statistics can be done with the ALTER DATABASE statement:

ALTER DATABASE Bigwind

SET AUTO_CREATE_STATISTICS OFF

Now let’s run a test query:

SELECT c.CustomerID, CompanyName

FROM Customers c

JOIN Orders o

ON c.CustomerID = o.CustomerID

JOIN [Order Details] od

ON o.OrderID = od.OrderID

JOIN Products p

ON od.ProductID = p.ProductID

WHERE c.Region=‘Delaware’

AND

od.unitprice < $6000

We get the following query plan:

query plan created without statistics

We notice at the right-hand side two warning triangles indicating that there are missing statistics. A typical execution time for this query without statistics was

SQL Server Execution Times:

CPU time = 156 ms, elapsed time = 317 ms.

Now we turn the auto-creation of statistics back on with

ALTER DATABASE Bigwind

SET AUTO_CREATE_STATISTICS ON

We can confirm that auto create is in fact on by checking the results of

SELECT Name,database_id, is_auto_create_stats_on FROM sys.databases

is_auto_create_stats_on

Re-running the test query we see an entirely different query plan, and the no statistics warnings have disappeared.

query plan with statistics

The execution time has been substantially reduced:

SQL Server Execution Times:

CPU time = 31 ms, elapsed time = 46 ms.

Guessing how many rows will be obtained from the tables in a query is usually referred to as “cardinality estimation”, but it is still just guessing. Without any help, the guess of the optimizer is just a shot in the dark. However, when column statistics are present, the query optimizer can make a much more educated guess when it is deciding on the best sequence for join execution and the best physical mechanisms for those joins.

Manually Creating Statistics

The auto create feature does a pretty good job left to itself, but it will only create statistics on a single-column at a time. Just as with indexes, it is possible to create statistics on multiple columns, If you think this might be of benefit, you will have to define the statistics manually and create them with the CREATE STATISTICS statement. The greatest difficulty is determining which sets of columns are the best candidates One of the things to look for is sets of columns that often appear together in a WHERE clause, but which are not specific enough for an index to be useful. In other words, a WHERE clause containing these columns as the filter criteria will return enough rows that the SQL Server optimizer will generally use a scan and therefore an index would be without value.

Conclusion

While index statistics often receive greater attention by administrators looking to make sure their database is well optimized, column statistics are critical if the optimizer is to avoid bad query plans.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.