Armelle G, a student in Learning Tree’s SQL Server Transact-SQL Programming course asks “Is CROSS APPLY only used with T-SQL Functions?”
Great question Armelle! Many SQL Server folks would answer “yes” but the fact is that CROSS APPLY can also be used with subqueries. CROSS APPLY allows you to create the equivalent of correlated derived tables in T-SQL.
Let’s start by looking at a very common query pattern using the classic Northwind sample database. We would like to see the top 2 most expensive products by category. It would be trivial to obtain the top 2 products overall, but the TOP operator will not help us when we want to group by category, at least not directly. TOP is a row limiter, and if we ask for TOP 2 we will get the first two rows yielded by the query, regardless of what the rest of the query is trying to do. However, we can break out the ordering of products by price into a separate derived table.
In the following example, note that the WHERE clause in the second query correlates with the current CategoryID value from the first SELECT statement of the query. We would simply not be allowed to do this if we were performing a classic JOIN between the first SELECT and the derived table.
Note something else as well: in a classic INNER JOIN, the order of the two tables being joined makes no difference. However, when using CROSS APPLY, the dependent correlated part of the query must come second. This is true whether the second part is a derived table or a table-valued function.
FROM Categories c
(SELECT TOP 2 p.ProductName
FROM Products p
WHERE p.CategoryID = c.CategoryID
ORDER BY p.UnitPrice DESC) AS t2;
Readers will likely already be wondering how this query technique compares with using the windowing function RANK. We could have solved the original query question as follows:
WITH PriceRanks AS (
, RANK() OVER(PARTITION BY p.CategoryID ORDER BY p.UnitPrice DESC) AS PriceRank
FROM Products p
JOIN Categories c
ON c.CategoryID = p.CategoryID)
FROM PriceRanks pr
WHERE pr.PriceRank <3;
If we test the performance and we find a curious result. The RANK query incurs 158 logical reads in contrast with 18 logical reads for the CROSS APPLY query. Normally, this would look like a big win for CROSS APPLY. But in terms of total query cost, the RANK function is about three times better in terms of overall performance, making CROSS APPLY a big loser. The culprit appears to the be large sort time in the CROSS APPLY version. What would happen if we supplied an appropriate index?
Let’s create an index on UnitPrice descending, but ensure that our new index covers our query, more precisely, the index contains all the columns required from the products table, since we still need category name.
CREATE INDEX IX_UnitPrice ON Products(UnitPrice DESC) INCLUDE (CategoryID,ProductName)
Comparing our queries again, we find that the tables have turned. Both queries use the new index, but the RANK query still requires a sort while the CROSS APPLY query is able to take full advantage of the descending order of the new index. The net result is that with the index, the CROSS APPLY query performs four times better than the RANK query, rather than three times worse.
Using CROSS APPLY to implement correlated derived tables should be another tool in your SQL toolbox. And remember, when considering performance, a query technique by itself can never be considered to be better or worse without also considering the entire database environment in which the query runs.
Please do not hesitate to drop me a note if you have any comments or questions. Next time we’ll take a look at using row constructors to break up continuous data into histogram bins for analysis.