Subqueries, Real and Apparent

Part III – A Look at a “Physical” Subquery

Perhaps I am being a little loose in my terminology, since there really isn’t a query operator called “subquery”. When I say “physical subquery” I am referring to those correlated subqueries where the subquery portion must be executed again and again, once for every row returned by the outer query in which the subquery resides. This can be identified in a query plan as a table spool with a number of rebinds equal to the number of rows.

We will use for our illustration the calculation of a moving average, because not only is this a routinely performed task, but many moving average examples found on the internet suffer from the same performance problems we will examine here. For our sample data, we will use the daily Dow Jones Industrial Average from 1885 to 2006. A script to build a table containing this data may be found here.

I’m sure you are aware why moving averages are desirable when exploring data. Moving averages smooth out short-term fluctuations, making it easier to see trends of moderate length. Here we can observe this effect in a DJIA plot limited to the years from 2000 to 2006.

Classical SQL for calculating a moving average looks something like this:

SELECT Date, DJIA,

(SELECT AVG(DJIA) FROM DowJones dj2

WHERE dj2.Date BETWEEN DATEADD(dd,-5,dj1.Date) AND DATEADD(dd,+5,dj1.Date) AND dj2.DJIA >0) AS MovingAverage

FROM DowJones dj1

WHERE DJIA > 0

ORDER BY Date

For each row in the outer query, which excludes weekends and holidays when the DJIA was not calculated, the subquery calculates an average over the range starting with five days preceding the current row’s date to five days after.

By way of contrast, we can also calculate a moving average using SQL Server’s windowing functions.

SELECT Date, DJIA,

AVG(DJIA) OVER (ORDER BY Date

ROWS BETWEEN 5 PRECEDING AND 5 FOLLOWING) AS MovingAverage

FROM DowJones

WHERE DJIA > 0

ORDER BY Date

The logic of the two SELECT statements is the same. The main difference, we shall see, is that SQL Server is designed to specifically recognize structures precisely defined in a windowing function, but may fail to recognize a logically equivalent query when written in the classic way without windowing functions.

We notice in the upper plan, that of the classic correlated subquery, there is an “index spool” operator. It is not a table spool because SQL Server has actually copied data into a index structure in tempdb for the purposes of this query. Notice also that this is an “eager” operator. If we look at the details of this operator, we see multiple executions, one, in fact, for every row.

The cost would have us believe that the windowing function is roughly 100 times better. The difference in measured execution time is more modest, but still the windowing function is better than twice as fast. SET STATISTICS IO ON shows us the IO, and we can see from another perspective that SQL Server has written data to tempdb and has had to scan that data 33,526 times during the execution of the query.

(33526 row(s) affected)

Table ‘Worktable’. Scan count 33526, logical reads 158405, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘DowJones’. Scan count 5, logical reads 584, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 343 ms, elapsed time = 740 ms.

(33526 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘DowJones’. Scan count 1, logical reads 292, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 141 ms, elapsed time = 337 ms.

Conclusion

Clearly, we could have done a much better job of tuning this particular moving average query. But the primary goal here was to show that subqueries that require repeated row access do pay a performance penalty. Windowing functions will generally perform better than the classic correlated subquery; the same is true of running totals, another common requirement in analytical queries.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.