SQL Server 2016 introduces updatable nonclustered columnstore indexes. The complexity of that phrase should not mask the simplicity and importance of its application. Updatable nonclustered columnstore indexes make possible what Microsoft calls “Real-Time Operational Analytics.” It would be hard to overstate the importance of making real-time transactional data available immediately for data mining and advanced analytics.
Databases are generally designed for transaction processing, that is, managing the data for the minute-by-minute running of the business, or they are designed for data analysis. Many of the architectural features ideal for analysis are simply not suitable for transaction processing. Indeed, the acronym OLAP, for Online Analytical Processing, was coined in the 1990s to emphasize the distinction from OLTP, a much older acronym for Online Transaction Processing. In many organizations, complex ETL jobs are required to move data from transactional sources to the data warehouse. Of course, this means that there can be a long time interval between the time a transaction takes place and the time that data appears in the data warehouse for study and analysis. How can the updatable nonclustered index mitigate the need for this process in many applications?
The nonclustered columnstore index was introduced as a purely analytic tool in SQL Server 2012. The combination of sophisticated compression algorithms and storing the data in a column-oriented format as opposed to the traditional row-oriented format, provided impressive performance improvement not only for analytic queries, but queries in general. In 2012, however, the nonclustered columnstore index could not be updated at all; it could only be dropped and recreated. This meant that the benefits of the columnstore index could not be used outside of the data warehouse. By removing this severe restriction in SQL Server 2016, the transactional database and the analytical database have moved closer together.
Of course, a columnstore index organizes data pages by column, so that any individual data page contains only data from a single column. This benefits aggregate queries greatly, as SQL Server can cruise through the column data it is aggregating without reading data from unneeded columns as would be done for a query reading conventional data pages. Clearly, this benefits many queries of the types commonly used by analysts. However, the benefit to the columnstore index does not end there. The data are actually stored in a highly compressed format, using the algorithms developed for the xVelocity database engine used in tabular mode SSAS. The remarkable performance gains provided by this compression provide benefit for almost all queries executed against a table with a columnstore index.
With the advent of SQL Server 2016, it becomes possible to enjoy the high read performance of columnstore indexes on a table required for transaction processing. Since you are still allowed only one nonclustered columnstore index on a table, it is wise to include virtually all columns that might ever be required by an analytical query. The table with the nonclustered columnstore index is the same as always, with whatever clustered index and constraints are appropriate. The table is updated as always, but any analytical queries being processed using the columnstore index immediately provide results including the new data. This raises a very important question; How does the presence of a nonclustered columnstore index impact the write performance for transactional table?
To get a very rough idea about the potential impact of a nonclustered columnstore index on write performance, I created a T-SQL batch that wrote 100 rows into a 12,000 row table. The new rows, were unrelated to order of the rows of the table itself (when there was a clustered index). I then measured execution time for the inserts when the table was a heap, when the table had a clustered index, when a conventional nonclustered index was added, and after the creation of a nonclustered columnstore index. The experiment was repeated to obtain a typical range of values.
Range of execution times of 100 insertions
Clustered index on primary key
Additional conventional nonclustered index
With nonclustered columnstore index added to clustered and conventional nonclustered indexes
As would be expected, the addition of indexes slows down write performance, but the slowdown observed with the nonclustered columnstore index was unremarkable. There was no attempt here to establish any sort of statistically significant averages, since one set of updates on one table is unlikely to be representative at all. However, it is fair to say that there is no evidence that a nonclustered columnstore index is in any way worse than a conventional index.
Microsoft suggests that a slightly more complex table architecture might prove valuable. In SQL Server, it has become possible to add a filter to the definition of a nonclustered columnstore index. It is therefore possible to create a nonclustered columnstore index that covers all but, say, the most recent month. Analytical queries could achieve a performance benefit by utilizing the columnstore index for “most” of the rows; the most recent month’s day would have to be obtained from the table itself or from a conventional covering index. The benefit of this technique comes when new rows are added or rows from the most recent month are updated. In this, the most common case, the columnstore index would not have to be updated at all.
The updatable nonclustered columnstore index introduced in SQL Server 2016 provides new options for database architecture that bring the classical data warehouse and the transaction processing system closer together to provide real-time operational analytics.