The folks at Microsoft have created quite a buzz with the integration of R scripting into SQL Server. In recent years, the open-source statistical program R has established itself as the de-facto statistics and analysis package for big data. But it has always been possible to query SQL Server from R using good ol’ ODBC. Why such a big fuss now? The classic client server model implemented with ODBC means that potentially very large datasets must be assembled, passed across a network, and manipulated in the memory of an analyst’s workstation. The introduction of what Microsoft calls “Advanced Analytic Extensions” into SQL Server 2016 means that datasets generated by T-SQL can be constructed in a server-side sandbox process generated specifically for this purpose. The larger the dataset, the greater the performance gains to be expected.
To achieve this, Microsoft has acquired Revolution Analytics, a software firm specializing in enterprise-scale R development. Revolution software achieves scalability by distributing R calculations across many nodes, superficially similar to Apache Spark, but the Revolution software is not Java-based but rather compiled C and C++ built on MPI, the Message Passing Interface. We will not focus on the distributed aspects of Revolution here, but some appreciation of the Revolution architecture necessary to understand why the SQL Server integration has the form it does.
The machine running SQL Server must have installed a Revolution Enterprise Node; this is the processing unit for distributed systems. An analyst working with SQL Server advanced analytics will also need a node on his or her workstation. Of course, the node will have to be managed, and for this an R installation with an interface will be necessary. In principle, R code could be written and tested anywhere, but an R installation of the server will be necessary to install desired R packages into the Enterprise Node.
By supporting only the custom Revolution Analytics implementation of R, Microsoft ensures a smooth installation process.
If you did not select the Advanced Analytics Extensions feature when you installed your SQL Server CTP you must rerun the SQL Server installation program and add it.
This installs a new service called Launchpad, which is responsible for the creating the sandbox process in which the Enterprise Node will run.
Revolution R Open 3.2.2 is a very standard-looking implementation of R for Windows. Note that the version is 3.2.2, which means that you will received warning messages whenever you install a package compiled with R 3.2.3. Generally these messages can be ignored. Revolution R Open 3.2.2 can be downloaded from https://www.microsoft.com/en-us/download/details.aspx?id=49525.
The installation program for the Revolution R node that will run the R scripts passed by SQL Server can be downloaded from https://www.microsoft.com/en-us/download/details.aspx?id=49505. This installation file is about 81.3 MB and is only a server node. A different installation must be performed on the workstations of analysts who will be using SQL Server. The client installation will not be discussed here – our immediate goal is simply to run some R code on SQL Server.
Running external scripts is now a server configuration setting, and must be enabled before the R analytic extensions can be used.
sp_configure ‘External Scripts Enabled’, 1
In my experience, a restart of SQL Server will be necessary for this setting to take effect.
A little trap: Stopping the SQL Server service will stop the Launchpad services required for R. However, restarting SQL Server will not restart the Launchpad. The Launchpad is not present in the Configuration Manager; you must go to the Windows services applet.
The pieces have now been installed, but they need to be hooked together. This will require, among other things, the installation of extended stored procedures that will link R scripts in T-SQL to the Enterprise Node. A command line program for this purpose is installed in the Revolution R Open installation and can be executed with the following command line. The script will also create a database role in master called db_rrole with permissions for these procedures. Be sure to run this script as administrator!
After completing a software installation, I want to run some code, not read the documentation. Although this has gotten me into trouble in the past, we will now run some R code in a Transact SQL batch. How about some R to output “Hello, World!’? Cliche, I know, but it does what all Hello, World programs should do – convince you that coding is easier than installing the platform.
@language = N’R’,
@script = N’R_data_frame <- InputDataSet’,
@input_data_1 = N’SELECT ”Hello, World!”’,
@output_data_1_name = N’R_data_frame’
WITH RESULT SETS((MyColumn char (13)))