SQL Server 2008 R2 and Sql Server 2012 brought with them lots of new features to please data warehouse admins, BI developers, and analysts. Lest the OLTP folks felt left out, Microsoft introduced in 2014 in-memory optimization technology, code-named Hekaton, along with other features designed to please database administrators concerned with transaction processing. It’s the beginning of a new year and the latest set of Community Technology Previews (CTP) of SQL include new toys for admins and new toys for analysts. Some improvements benefit both admins and analysts that may actually please everyone! Let’s take a quick overview of some the newest and most interesting new features for SQL Server 2016.
In-memory optimized tables were introduced in SQL Server 2014, the the first implementation left some DBAs wanting more. SQL Server 2016 is addressing many of these wishes, all of which enhance the scalability and throughput of transactional systems. Data integrity for in-memory tables can now be enforced with constraints and natively compiled AFTER triggers. Clumsy code to enforce uniqueness can be dispensed with in 2016’s in-memory optimized tables; unique indexes are now supported. Administrators with a penchant for hardcore coding will be pleased that natively compiled stored procedures and functions no longer require an explicit EXECUTE AS statement. EXECUTE AS CALLER, which has always been the default for T-SQL, is now the default for SQL Server native code modules as well. Other new features targeted at OLTP systems include nullable index keys, support for binary large objects (LOB), and natively compiled table-valued functions.
Sometimes the practical importance of new features is not immediately obvious from a simple description. Non-clustered columnstore indexes were introduced in SQL Server 2012 and provided stunning performance for analytical queries. But, since they were not updateable, their value was mostly limited to data warehouses. Now, non-clustered columnstore indexes are updateable, which provides the foundation for what Microsoft terms “Operation Analytics”, which is essentially real-time analytics from transactional databases.
Many data warehouses must implement complex ETL to grab updated rows from a transactional database and bring them into the data warehouse. Such tasks are almost always implemented as batch jobs, which puts a limitation on analysts who wish to examine the most recent data. Operational analytics replaces this cumbersome architecture, potentially even eliminating the data warehouse, for some systems. The idea is simple. The transactional data is written to a transactional table as always. But now there is a non-clustered columnstore index on the transactional table. Tools such as the Analysis Services can then rapidly read data from the columnstore index to satisfy real-time analytical queries.
Non-clustered columnstore indexes can even be created on in-memory tables, but only at the time of table creation. They cannot be added later.
FOR JSON AUTO can be used in subqueries, which can then be used in the SELECT column list to create standard rowsets where one of the output columns is JSON.
The rapid explosion of unstructured and semi-structured data had presented both problems and opportunities to database administrators. Clusters of commodity computers running Hadoop provide immense and scalable storage, but integration with existing systems and client applications can be challenging. Polybase provides a means for administrators and systems analysts to query Hadoop and cloud-based data (read: Azure) with the sophisticated and well-established toolset of T-SQL. Perhaps more importantly, Polybase achieves this without making any modifications of the Hadoop system or, for that matter, requiring much knowledge of Hadoop. Polybase is a Java application that creates and employees three SQL Server databases to manage the necessary information to connect to Hadoop and query Hadoop’s distributed data store. Polybase can also manage importing and exporting data between SQL Server and Hadoop.
The statistical programming language R has become the de facto standard for the new age of what some folks call “data science”. A snowball effect is happening right now; the more people use R, the more R tools are developed, which makes R even more attractive to other potential new users. Microsoft has shown its respect for this open-source toolset by building into SQL Server the potential for executing R scripts. As a part of these efforts, Microsoft has acquired the firm Revolution Analytics, a software company providing analytical tools for R, as well as its own enhanced R distribution.
Using R from within T-SQL requires installing “Advanced Analytics Extensions” as one of the SQL Server database engine installation options. One must then install the Revolution R distribution. In addition, you must install Revolution R Enterprise on the server. There is also a client version of Revolution R Enterprise for installation on what Microsoft terms the “data science workstation”. SQL Server 2016 is not configured by default to run external scripts, to the new new configuration option “external scripts enabled” must be set to 1. This will require a server restart. At runtime, R scripts are executed by a new extended stored procedure “sp_execute_external_script”. Curiously, the name of the scripting language, in this case “R”, is an input argument, suggesting the possibility that other scripting languages may be supported in the future (Matlab perhaps?). Although right now “R” seems to be the only option, the preliminary documentation says that a script can be executed in any language registered with “xxx.language”. Presumably, this is just a placeholder in the documentation for a future tool. By the way, Googling “xxx.language” is not helpful.
SQL Server 2016 introduces new features of great interest to administrators of OLTP systems and to data scientists. In-memory optimized tables, introduced in 2014, have new and greatly enhanced features and some of the shortcomings have been successfully addressed. Updateable columnstore indexes provide the means for greater integration of OLTP systems and analytic systems. The new features of SQL Server 2016 also illustrate Microsoft’s commitment to open-source analytic tools such as R, as well as, distributed “big data” stores including Hadoop.