In my last post we looked at what are clustered columnstore indexes, their benefits and how they have changed in SQL Server 2014.
Now, let’s learn how to create one. Since we would like to make some performance measurements on a clustered columnstore index, we will use a sample database of sufficient size to pose some performance challenges, ContosoRetailDW. This sample can be downloaded as an SQL Server backup file from https://www.microsoft.com/en-us/download/details.aspx?id=18279.
Before creating the clustered columnstore index, we might wish to run a few queries for comparison purposes, recording both IO and calculated query cost. The script file for the queries used can be downloaded here.
You can create a clustered columnstore index using the Management Studio GUI if you wish. If you right click on the indexes folder for the FactSales table of Contoso, you will be informed that the existing keys and indexes prevent you from doing so. After deleting the offending objects, we can return to the dialog, which will generate the following SQL script.
CREATE CLUSTERED COLUMNSTORE INDEX [ClusteredColumnStoreIndex-20150107-105938] ON [dbo].[FactSales] WITH (DROP_EXISTING = OFF)
Note that there is no mention of columns, since they are all included in a clustered index, and no mention of a clustering key, since that is not supported.
After creating the clustered columnstore index, we can compare the performance of some queries.
First we’ll look at a classic “needle-in-the-haystack” seek:
SELECT * FROM FactSales WHERE SalesKey = 42
The cost of this query was 0.0032831 with the original clustered index seek before dropping the indexes and creating the clustered columnstore index. Afterwards, the cost was 1.74666. Clearly much worse, but remember it’s much worse on a low-cost query of a type not often run on data warehouse data.
Next we will take a look at a “typical” data warehouse query, that is if any query can be said to be typical.
SELECT YEAR(DateKey), SUM(SalesAMount)
GROUP BY YEAR(DateKey)
This query is “typical” in the sense that in a classic table scenario, all rows would be scanned and most of the work would be done grouping the data by year. For this query, the cost was 21.6781 with the classic clustered index and dropped to 2.12821 with the clustered columnstore index. It should be no surprise that the clustered columnstore index performs better here, since this is the sort of query for which the columnstore indexes were created to begin with.
Lastly, let’s look at a query that is kind-of “in the middle”.
SELECT SUM(SalesAMount) FROM FactSales
WHERE DateKey >= ’12/1/2007′ AND DateKey<’12/2/2007′
This query is summing over rows, but only one column. However, the query is not summing over all rows, but rather a relatively small subset. This is the sort of query that has benefited from conventional B-tree index lookups in a classic database design. The classic cost for this query was 17.3621 while the cost with the clustered columnstore index was 0.726925. We might be forgiven for showing a little surprise here. Despite the narrow date range in the WHERE clause, the clustered columnstore index still substantially outperformed the classic database structure.
Although our small set of test queries cannot be regarded as a representative sample, it appears that the clustered columnstore index incurs a modest cost on queries not often run on a data warehouse and provides substantial performance on query types run frequently. Even queries that benefit from classic lookups seem to fare well under the new columnstore index regime.
For more, have a look at Learning Tree’s new 3-day course – SQL Server 2014 for Performance Enhancements.