In Part 1 we saw how to configure a server with Microsoft R Server to make use of the Rmpi library to achieve parallel processing. In Part 2, we will look at how to launch R Server processes on multiple machines creating a flexible R cluster.
The R Server installation adds the MPI-related services to the “Allowed apps and features” list of Windows firewall. If you would like to run rterm.exe or rscript.exe ( you almost certainly will), you must add these programs.
OK. MPI launch service running? Windows Firewall settings correct? Great. We’re ready to go. Running the following line at the command line should start multiple R processes on a remote host.
mpiexec -hosts 1 192.168.206.134 3 “Rterm.exe” –no-save -q
After the “hosts” parameter name is the number of remote hosts, in this case, 1. Following that is the name or address of the remote host, and after that is the number of processes to launch on that host, in this example. Parameters after the executable file name belong to that executable, not to mpiexec.
Here we add on another host:
mpiexec -hosts 2 192.168.206.133 3 192.168.206.134 4 “Rterm.exe” –no-save -q
In this example, we are adding two hosts. As before we are creating three processes on the first, but in this case, we are also creating four processes on the second.
In the previous example, we used the sample( ) function to select integers from a finite range. Now let’s use the runif( ) function to obtain uniformly distributed floating point numbers.
Notice that we have obtained six sets of random numbers, one set from each slave. Each set contains 10 random numbers.
If you will routinely be using the same hosts with the same number of processes, you can create a configuration file containing the host names and save the trouble of typing them at the commandline. Creating such a file in notepad is simple; it’s one line per host followed by an optional number of processes.
192.168.206.133 3 # this is a comment
The mpiexec command looks like this:
mpiexec -machinefile hosts.cfg “Rterm.exe” –no-save -q
As always, we want to quit gracefully.
Running multiple R processes on multiple hosts requires several factors to be correctly orchestrated. The Rmpi .Rprofile file must be in the current directory, the Launch MPI service must be running, and Rterm and Rscript must be added to the list of allowed applications in Windows Firewall. Of course, once the system is configured to run multiple Rs communicating via MPI, the real work is just beginning. The API provided by Rmpi must be used to develop R scripts specifically crafted to take advantage of this parallelism.