Columnstore indexes were introduced into SQL Server in 2012, but in a sense the columnstore index is older than that, since the fundamental structure was first used for the tabular storage model of Power Pivot (and then used in the Analysis Services tabular model). In SQL Server 2012, however, columnstore indexes were always nonclustered, in the sense that the columnstore index always stored a copy of data from the physical table, whether the physical table was a heap or a conventional clustered index.
This has changed in SQL Server 2014. Columnstore indexes can now be clustered, but only in the sense that the columnstore index itself constitutes the physical table. The actual in-memory columnstore technology is the same for both clustered and nonclustered columnstore indexes. Unlike conventional clustered indexes, columnstore indexes are not B-trees and are not useful for retrieving small numbers of rows. Microsoft has also provide a means for updating columnstore indexes, which did not exist in SQL Server 2012. Despite this new ability, columnstore indexes remain suitable only for databases that are updated infrequently, perhaps as nightly batch jobs. Data warehouses, of course, are the primary beneficiaries of the columnstore index.
Columnstore indexes, both clustered and nonclustered, gain their primary performance advantage from their organization, which permits SQL Server to satisfy the requirements of a query without reading any unnecessary columns data. When data pages are organized as row stores, as SQL Server reads data pages it is reading entire rows, which very likely includes column data not required for a particular query. However, in columnstore indexes, each data page contains only entries for one column, and so each page read contains no extraneous column data. Of course, the remarkable compression achieve also contributes to the improved performance.
Columnstore indexes are, therefore, not indexes in the conventional sense, that is to say they are of no value in the conventional “needle-in-a-haystack” search where we are seeking one row or a small number of rows among the many many rows of a large table. Columnstore indexes provide the greatest benefit to the sorts of queries that are run quite often on data warehouse tables. Large number of rows are read, but as they are read they are aggregated into sums and counts.
An interesting consequence of the structure of columnstore indexes is that they make many queries easy for SQL Server to parallelize.
While the most important change that comes with the introduction of clustered columnstore indexes is the support for updates, the restrictions are notably severe as well. Clustered columnstore indexes cannot be used to support uniqueness on a key, and, furthermore. If you create a clustered columnstore index you cannot have any other indexes. Does this mean that we must give up enforcing uniqueness of keys to use clustered columnstore indexes? Well, there is a workaround, and we will discuss that in part 3.
Clustered columnstore indexes manage inserts, updates, and deletes by storing them in an intermediate data structure called the deltastore. The deltastore accumulates changes until a large enough number of changed rows have accumulate to support efficient copying of the new data into the clustered columnstore index. Rebuilding the clustered columnstore index will have the net effect of copying the deltastore rows into the newly rebuilt columnstore pages.
There are several catalog views that will provide useful information about the columnstore indexes.
Of these, sys.column_store_row_groups is probably the most useful, since it permits the administrator to determine if the size of the deltastore might warrant a REBUILD.
The introduction of clustered columnstore indexes in SQL Server 2014 addresses the problem of updating active tables with a columnstore index, but at a substantial cost. Clustered columnstore indexes cannot be used to enforce uniqueness of keys, and the existence of a clustered columstore index precludes the possibility of using additional indexes.
In Part 2 we will look at the T-SQL code for creating columnstore indexes.
For more, have a look at Learning Tree’s new 3-day course – SQL Server 2014 for Performance Enhancements.