Using a Pipeline Operator in R

data-science-course-for-big-data-analytics-training

I find myself programming more and more in a functional style these days. Obviously, R, F# and Scala encourage it—but I’m a heavy user of LINQ in C# and my JavaScript has been going that way for a while too.

When programming in languages other than F#, I yearn for the pipeline operator (|>). The pipeline operator lets you pass an intermediate result onto the next function. So, if you wish to apply functions h, g then f to a value x, instead of writing

f(g(h(x)))

you can write

x |> h |> g |> f

Personally, I think this is easier to read. It more naturally represents the order of execution. Apply h to x, then apply g to that result and, finally, apply f.

The value the pipeline operator brings in terms of readability becomes even more obvious when we look at a less abstract example. Let’s say that we want to sum the squares of the odd numbers between 1 and 10. In F#, without the pipeline operator, this might be coded as

Seq.sum (Seq.map (fun x -> x * x) (Seq.filter (fun x -> x % 2 = 1) [| 1..10 |]))

However, with the pipeline operator, we can write

[| 1..10 |]
|> Seq.filter (fun x -> x % 2 = 1) 
|> Seq.map (fun x -> x * x) 
|> Seq.sum

This more cleanly separates the various stages of the calculation.

Of course, we could have written

let numbers = [| 1..10 |]
let oddNumbers = Seq.filter (fun x -> x % 2 = 1) numbers
let squaredOddNumbers = Seq.map (fun x -> x * x) oddNumbers
let result = Seq.sum squaredOddNumbers

but that doesn’t feel very functional.

I should note that |> is actually one of a pair of pipeline operators in F#. It’s actually the pipe-forward operator. There is also a pipe-backward operator (<|)—but the former is far more common.

The pipeline operator becomes the way I think about code so, when I have to return to R, I find the transition jarring. However, as usual, it turns out there’s a package in R that resolves my dilemma—magrittr.

The magrittr package introduces a pipeline operator for R—%>%. Let’s go straight to an example. Using magrittr, the calculation we made above, translated to R, is

1:10 %>% subset(. %% 2 == 1) %>% . ^ 2 %>% sum

The period (.) is a placeholder for the value(s) piped through from the left of the operator.

For a more realistic example we can process the mtcars dataset that is included in the standard R deployment. We’ll make use of the excellent dplyr package to manipulate the dataset. dplyr is designed to support the pipeline operator.

Let’s find the average MPG for all the cars that have under 100 horsepower.

library(dplyr)
library(magrittr)

mtcars %>% subset(hp < 100) %>% summarise(mean.mpg = mean(hp))

The answer should be 76.3.

This has been a brief introduction to the pipeline operator and how you can use it in R. Frankly, I’m all for anything that helps improve the readability of R code without lowering oneself to the level of assigning everything to an intermediate variable.

If you’d like to learn more about R, Learning Tree has two courses that may be of interest.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.