SQL Server 2014 New Features: Cardinality Estimation in the Query Optimizer

Is it Better or Worse?

An optometrist checks vision
Better or worse?

What is Cardinality Estimation

A fancy-sounding phrase, “cardinality estimation” basically means “guessing how many”. The ability of the query optimizer to select a superior query plan from among many possibilities requires the estimation of how many rows will be extracted from the participating tables and how these rows will be joined. (We always talk about the “one-to-many” relation, but the optimzer needs to ask “one to how many?”) Of course, this is why SQL Server needs to maintain statistics about the row data contained in its tables. Traditionally, the selectivity of a column value like ShipCountry was considered independently of some other column like ProductName. However, our customers in Brazil might be more likely to buy Guaraná Fantástica than our customers in Norway. The assumption of independence might be reasonable for purposes of practicality, but its likely to be, strictly speaking, false.

One of the new features of SQL Server 2014 is the consideration of correlations between column values in the estimation of cardinality for a query plan. We hope, of course, that this will tend to provide better plans and help our queries to run faster, However, the new estimation methods might in some cases yield inferior plans for queries when compared with earlier editions of SQL Server. In general, making a change that makes things worse is called regression.

Some SQL Query Optimization Experiments

To examine query regression in SQL Server 2014 we will need a few simple tools. If you are not a right-clicker by nature, you can check the current compatibility level of a database with the following query:

SELECT compatibility_level FROM sys.databases WHERE name = ‘ContosoRetailDW’

Of course, this value can also be found in the database properties dialog of in the Management Studio.

A Trap (albeit a tiny one)

Remember that if you restore to SQL Server 2014 a database backup made with a previous version of SQL Server the database will have its compatibility level set to the older version. To see the effects of the new engine, the compatiblity level must be set to 120.

For purposes of comparison, we can switch back and forth between compatibility level 120 (i.e. 2014) and an earlier value using the ALTER DATABASE statements.

ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 100

or

ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 120

The query and accesory queries used in this demonstration can be downloaded here. I’ll not clutter the space here with the actual query text, but I will say that it is an analytic-style query executed on the ContososRetailDW sample database. Of course, one can never draw general conclusions by looking at results for a single query.

Let’s run the query using compatibility level 100. As usual, we can right-click on the SELECT node of the query plan to see the subtree cost.

Query Cost100

Now we can try it again after setting the database to the newer compatibility level of 120.

Query Cost 120

In this case, it is also valuable to hit F4 and bring up the properties window for the SELECT node. This will allow us to confirm the version of the cardinality estimator used by the query engine.

QueryPlanNodeProperties

In this example, the numbers are not promising. The estimated query cost is 62.8562 for the new cardinality estimator and 47.9955 for the older version.

Another Trap (This time a really big one!)

We’ve become used to thinking that a smaller cost is better. However, you cannot directly compare costs across different engine versions! Let’s remember the purpose of cost. The optimizer compares costs between potential plans and goes with the cheapest. But the absolute value of cost has no meaning. Sadly, we cannot look into the operator and observe this process. It’s entirely possible, though, that the 2014 engine looked at the plan used by the earlier engine and assigned it a cost greater than 62.8562.
Remember that cost is useful only insofar as it correlates with actual execution performance. The decision made by the 2014 engine may yield a higher cost, but that can be an improvement if the relation between the costs of various plans is more closely related to actual query execution performance.

STATISTICS IO and STATISTICS TIME

Let’s examine the query again using our old friends STATISTICS IO and STATISTICS. The total page reads for the 2014 optimization is 19111, slightly less than the total of 19756 for the earlier optimizer. More interesting is the cpu and total time. Here is the time for the earlier query engine:

SQL Server Execution Times:
CPU time = 4549 ms, elapsed time = 1589 ms.

(How can the cpu time be greater than the elapsed time? If you said parallelism you get today’s gold star. The properties of the SELECT node in the query plan confirms a parallelism of 4 for this query.)

How about the 2014 optimizer?

SQL Server Execution Times:
CPU time = 3046 ms, elapsed time = 3119 ms.

Curiously, the 2014 optimizer didn’t bother with parallelism this time. However, the cpu time is significantly less than for the earlier version. The 2014 query executes faster, even though the arbitrary cost value at first glance seems worse.

How to Manage the Query Optimizer

In the previous tests, we changed the database compatibility level to observe changes. We really don’t want an all-or-nothing solution; we would like to be able to control the optimizer for individual queries. SQL Server 2014 provides some new query hints to allow us to do just that.

OPTION (QUERYTRACEON 2312)

Sets the optimizer to 2014 (i.e. compatibility level 120) for the lifetime of the query in in which this hint appears. Conversely, if you have chosen 120 as your compatibility level for the databases.

OPTION (QUERYTRACEON 9481)
Will temporarily revert to the older version of the cardinality estimator.

Conclusion

By using a more sophisticated algorithm for cardinality estimation, the 2014 query engine can often pick a superior execution plan. We must look at actual execution measures to determine if a query has actually performed better with the plan chosen by the new engine. If we discover a query has, in fact, regressed, we can use OPTION (QUERYTRACEON 9481) to temporarily tell the optimizer to use old-style cardinality estimation for generating the query plan.
The ContosoRetailDW database used in this example can be downloaded from Microsoft.

To learn more about SQL Server 2014, have a look at our new, 3-day hands-on course – SQL Server 2014 Performance Enhancements.  Attend in class from one of our many centers or online from home or office using our AnyWare platform.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.