Introducing Microsoft R Server

R(1)

Earlier this year Microsoft released Microsoft R Server. This is essentially a rebranding of Revolution R Enterprise—acquired through Microsoft’s acquisition of Revolution Analytics in April 2015.

However, the fact that Microsoft is backing the product makes a big difference to many potential corporate users. And with Microsoft embracing R across the company, more investment in Microsoft R Server is surely in the cards.

What is Microsoft R Server?

Microsoft describe it as, R for the enterprise. It’s basically a suite of services/products, comprising

  • Microsoft Open R—stable version of open source R with some high performance math libaries
  • DevelopR—Windows IDE for developing R applications
  • DistributedR—cluster computing framework for big data analytics
  • ScaleR—package of R functions that support statistical analysis and machine learning on big data
  • ConnectR—provides facilities to connect to a range of big data sources
  • DeployR—web services SDK for integrating R with other services

Using Microsoft R Server

As you’d expect from Microsoft the installation is fairly straightforward. You have to install Microsoft R Open before installing Microsoft R Server. Both are standard MSI installers (on Windows).

Running Revolution R Enterprise 8.x (64) launches the R Productivity Environment (RPE). This IDE is similar to the excellent RStudio. One major difference is that result panes (such as plots) are displayed in floating windows. R Tools for Visual Studio is under development, which may become the primary Microsoft IDE for R.

RPE screenshot

RPE can be used to run all the standard R commands/packages. One of the benefits of using Microsoft R Server (or Microsoft R Open) for doing basic R work is that Microsoft has replaced some of the core libraries with high performance ones. This means that R functions that utilize basic core calculations, such as matrix multiplication, will run faster on Microsoft R Open than in open source R.

Big data analytics using Microsoft R Server

The first thing to note about working with big data in Microsoft R Server is that you can’t just run your standard R scripts and expect them to be magically mapped to a cluster. ScaleR provides a set of R functions designed to operate on a cluster. Most of the common statistical and machine learning techniques have been implemented, and the available functions will be added to over time.

ScaleR (on Hadoop—probably the most common big data framework) includes:

  • rxSummary—basic summary statistics
  • rxLinMod—fits a linear model
  • rxLogit—fits a logistic regression model
  • rxGlm—fits a generalized linear model
  • rxKmeans—performs k-means clustering
  • rxDtree—fits a classification or regression tree (using an algorithm developed by Ben-Haim and Yom-Tov)
  • rxDForest—fits a classification or regression decision forest
  • rxBTrees—fits a classification or regression decision forest using a
    stochastic gradient boosting algorithm
  • rxPredict—calculates predictions for any fitted model

These functions are designed specifically to work with (in this case) Hadoop clusters. Using them is as simple as calling an R function.

There are also general data manipulation functions and functions for controlling jobs Hadoop jobs and interacting with the HDFS file system.

Analyses using Microsoft R Server and Hadoop generally proceed as follows:

  1. Start Microsoft R Services
  2. Specify the Hadoop NameNode
  3. Create a compute context for Hadoop
  4. Create a data source
  5. Summarize your data
  6. Fit a model to your data
  7. Make predictions using the model

All the steps are covered in the RevoScaleR Hadoop Getting Started Guide). In this article we’ll briefly cover creating a data source and analysing the data.

Creating a Hadoop data source

Start with an HDFS object

hdfs <- RxHdfsFileSystem()

Then create the data source

dataSource <- RxTextData(file="/data/sales", missingValueString="?", fileSystem=hdfs)

Summarizing the data

To summarize the sales and profit figures use

rxSummary(~sales+profit, data=dataSource)

Fitting a model to the data

salesProfitLinearModel <- rxLinMod(sales~profit, data=dataSource)

Calculating predictions

Get the data you want to make predictions for

predictionDataSource <- RxTextData(file="/data/newSales", missingValueString="?", fileSystem=hdfs)

Predict the profit levels using the linear model and the new sales data

rxPredict(salesProfitLinearModel, data=predictionDataSource)

Summary

As you can see from this brief introduction, if you’re comfortable using R, Microsoft R Server gives you a direct route to big data analytics. If you’ve been doing statistical analysis and machine learning in R at the workstation level, the functions in ScaleR shouldn’t contain any surprises.

If you’re interested in big data analytics or R you may wish to consider the following Learning Tree courses:

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.