SQL Server 2016: Parallel Processing for Microsoft R Server Part I

parallel processing

Part I: Setting Up the System

In several previous blogs, we have talked about installing Microsoft R Server as a part of a SQL Server 2016 installation, and we have called attention to the fact that the SQL Server 2016 installation disk provides for a standalone installation of R Server. What we have not discussed is that this standalone installation provides you will all the tools (well, almost all the tools) to get started with parallel programming in R.

To be sure, if you are serious about parallel programming in a Windows environment you should build a Microsoft HPC cluster. However, if you just wish to explore parallel programming with R, you can get started immediately using MPI, the Message Passing Interface.

If you have not already installed R Server, you can install it from the same setup program that installs SQL Server itself. On the installation tab, near the bottom, there is a selection for a standalone installation of R Server.

Installation Dialog

The installation process for R Server installs Microsoft’s implementation of MPI. It also installs a launch service for MPI processes. By default, this service is set for manual start. To work with MPI, this service must be running on all participating machines. You might wish to simply set the service start mode to ‘automatic’.

MPI Launch Service

Once the installation is complete, you will need to go to the folder where R lives. Copy the full path and use it to update the Windows ‘Path’ environment variable. As long as you you may as well create a desktop shortcut for yourself. Whether you create a shortcut or not, start the R GUI with Administrative privilege so you can install the Rmpi package for everyone, not just you.

R Server Installation Directory

Once the R GUI starts, we will want to install Rmpi, the R package that provides an interface between R and the MPI libraries. You must choose a mirror site from which to download if you have not already done so, and then choose Packages | Install Package(s)… and select Rmpi from the list.

You can test MPI on multiple processes running on a single machine, but the real benefit lies in distributing processing over multiple machines. The process we have just described will have to be repeated for each machine that will participate in your R cluster.

Configuring Rmpi

You will need to choose a working directory for your R MPI studies; you must use the same folder on all machines participating in the cluster. I have used my folder in the Windows “Users” folder, but you are free to choose wherever you like. While we won’t be making use of it here, in general it is convenient to choose a folder that can be shared among machines as a UNC folder. MPI does not make use of this, but it is convenient if you wish to copy R scripts to all the machines participating in the cluster, as well as copy input and output data files.

Once you have chosen your folder, you must go to the Rmpi folder in the R library folder. There you will find a file named Rprofile. This file contains information that will be read by the Rmpi library at runtime to provide necessary configuration information. Copy this file to your chosen working directory. After you have copied this file, you must rename it to “.Rprofile”. The Windows File Explorer does not seem to want you to create a filename with a leading punkt, so you will have to rename the file at the command line: rename Rprofile .Rprofile.

Since we will want to be launching R while in this directory, you should add “C:\Program Files\Microsoft SQL Server\130\R_SERVER\bin\x64” to your Path environment variable.

Testing the Installation

When I am finished an installation, I don’t want to read more, I want to run something. Let’s start up four MPI R processes to confirm our installation was successful. Open a command prompt in the same folder you chose as your working directory, in other words, the folder containing the file “.Rprofile”. If you have read other blogs on the web about R and MPI, you will likely have seen references to loading the Rmpi library in R and then calling mpi.spawn.Rslaves( ) . Forget it. This is a Linux thing and will not work in Windows. Because of some differences in how threads and processes are spawned, in Windows we do not start MPI from R, we start R from MPI. At the command prompt, type

mpiexec -n 4 “Rterm.exe” –no-save -q

This example creates four R terminal processes on the local machine. it takes a while to start, since you are waiting for much of R to be loaded. You should then see something like this:

mpiexec command line

MPI has created four processes. We are working at an R prompt in the master, and in this case there are three slaves.

We can send commands to the slaves as so:

> result <- mpi.remote.exec(sample(1:100,1))

result <- mpi.remote.exec(sample(1:100,1))

> result

result

X1 X2 X3

1 63 27 43

>

In this example, we obtained a result variable that is a data frame containing the results of the sample( ) function from each of the three slaves.

When we are finished, we want to exit gracefully.

mpi.close.Rslaves()

mpi.quit()

Conclusion

With a little attention to detail, it is not hard to work with Microsoft R Server and MPI. In the next installment we will pursue the next logical step and create our slave processes on remote servers.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.