The introduction of Microsoft R Server with SQL Server 2016 was tremendously exciting for data scientists and analysts working with “Big Data”. When SQL Server 2017 was released, many were shocked to see that R Server was gone. They were, however, relieved to discover that R Server wasn’t gone, it had become “Machine Learning Server” to emphasize the fact that the server now supports Python and is no longer limited just to R.
Both R and Python were originally conceived in an era when nobody had multicore CPUs and 1 gigabyte of RAM seemed unimaginably large. Today, however, both hardware and the demands of data analysis have both outstripped these core languages. Machine Learning Server, or just “ML Server”, provides a set of tools to help R and Python address the expanding needs of modern data science.
ML Server is, well, a server. Analysts sitting at their workstations are not limited by the speed and available memory of their local computers; they simply start a session on the server. Furthermore, when Microsoft acquired Revolution Analytics they acquired a set of tools that facilitate the analysis of data volumes too large to fit into memory all at once.
Both R and Python are fundamentally single-threaded languages, meaning their basic processing engines cannot take advantage of today’s multicore processors. Newer software packages from third-party vendors and open-source developers retro-fit some ability to provide multithreading capabilities to R and Python. ML Server makes multithreading available in a way that is largely transparent to analysts. All work done by ML Server executes in some compute context, and parallel-processing-enabled compute contexts already know how to divvy up tasks across multiple cores. Indeed, tasks can be divided among multiple ML Servers deployed on the local area network. These machines can communicate using http or by using the faster but more technically complex MPI (Message-Passing Interface).
ML Server also provides some substantial “turn-key” data science solutions that allow you to perform tasks such as sentiment analysis or neural network classification by feeding your data to Microsoft’s packages and not having to write any program code yourself.
Deploying Machine Learning Servers is surprisingly easy. Microsoft provides a PowerShell application that sets up the infrastructure enabling an ML Server to communicate with the workstations of analysts and other ML Servers on the LAN. This PowerShell script, runAdminUtils.ps1, appears in the Windows menu system as the shortcut “Machine-Learning-Admin-Util”. This application can also be used to monitor and test ML Servers. In my experience, it is necessary to monitor server health and occasionally restart a moribund ML Server.
The RxInSqlServer compute context, more often just referred to by its alias “sqlserver”, is what first caught the eye of analysts when R Server was first released in 2016. While this context enables the execution of R and Python scripts by SQL Server, and enables the execution of R and Python code from within T-SQL stored procedures, it should be emphasized that this is most definitely not how one would normally approach the analysis of SQL Server data. The analysis of data is done in what now might be called the “classic” way, which is to import data from SQL Server and analyze it elsewhere.
The sqlserver compute context is best suited for tasks that need to be done locally, that is to say on the same physical machine running SQL Server. There are several situations in which this might arise. A fraud-detection algorithm might need to be executed as new rows are added. Similarly, a retail recommender algorithm might be applied to new purchase items being recorded in a database. Another potential use for the sqlserver context might be the construction of XDF files.
XDF files were introduced to provide a means of operating on “chunks” of data from a file piece=by-piece rather than have to read the entire file all at once. For particularly large amounts of data, it might be desirable to copy data from SQL Server into an XDF file locally, rather than consume network bandwidth by attempting to create the file on a different computer. The creation and use of XDF files is one of the topics covered in depth in Learning Tree’s course 8489, Analyzing Big Data with Microsoft R.
Microsoft Machine Learning Server is a substantial component of Microsoft’s commitment to enterprise-grade analytics. Analysts’ workstation tools, like R Studio, Excel, Visual Studio, and Power BI Desktop can utilize ML Server to cope with the ever-increasing volume and velocity demands of big data.
AUTHOR: Dan Buskirk