In the previous installment, we looked at a very simple logical subquery that SQL Server executed as a physical join. We shall now take a look at a somewhat more complicated query and see what SQL Server does with it. In this query, we would like to get a listing of all Northwind products that have a list price greater than the average list price for their category. Since we are not dealing with a single overall average, this will have to be a correlated subquery.
FROM Products m
WHERE m.UnitPrice > (SELECT AVG(s.UnitPrice) FROM Products s WHERE s.CategoryID=m.CategoryID)
It is not surprising that the query execution plan is considerably more complex than in the previous example.
We see an operator we have not yet encountered, the Table Spool. A table spool represents an internal data structure that can be accessed repeatedly to supply rows to the query engine. In this example, the operator feeds the engine lists of products and unit prices, first for one CategoryID value, then for the next CategoryID value, and so forth. As we have said, SQL Server avoids a physical subquery whenever possible, and in this case it goes through the products table, calculates the average for each CategoryID, and then performs a join that includes the criteria specifying that the individual product unit price must be greater than the average price for its category.
If we position the mouse pointer over the first table spool icon, we can obtain more information.
We see that the table spool has executed nine times. In addition, we observe that there are nine rebinds. This refers to the fact that the spool is parameterized and during the course of the query execution, the parameter value has been reset, or, if you prefer, a new parameter has been “bound” to the spool. Of course, the value is the CategoryID. The Microsoft documentation says that in a case like this the number of rebinds should equal the number of different values, but in my experience it seems to be the number of values+1. There are eight different categories in the Northwind Categories table.
Spools can be eager or lazy. These concepts not only come up in SQL Server, but they can be found in most modern programming environments. An “eager” operator does its work the moment is defined. In contrast, a “lazy” operator postpones its work until later. In this example, that means that even though the table spool operator is defined first at the right-hand side of the query plan, the actual averages will not be calculated until further downstream when another operator actually requires the results.
SQL Server will avoid a physical subquery whenever possible. In this example, if SQL Server interpreted the SELECT statement literally, it would have calculated 77 averages, one for each product. However, SQL Server chose to set up a table spool so it need only calculate the average 8 times, one for each category, and then use these eight values in the filtering of product rows from the table.
In the next installment, we will see the unpleasant performance we get when SQL Server cannot find an alternative, and must execute an actual physical subquery.