pipeR provides Pipe operator and function based on syntax which support to pipe value to first-argument of a function, to dot in expression, by formula as lambda expression, for side-effect, and with assignment. The set of syntax is designed to make the pipeline highly readable.
The following code is an example written in traditional approach:
It basically performs bootstrap on mpg values in built-in dataset mtcars and plots its density function estimated by Gaussian kernel.
plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE),
kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")
The code is deeply nested and can be hard to read and maintain. With Pipe operator, it can be reorganized to
mtcars$mpg %>>%
sample(size = 10000, replace = TRUE) %>>%
density(kernel = "gaussian") %>>%
plot(col = "red", main = "density of mpg (bootstrap)")
The code becomes much cleaner, more readable and more maintainable.
%>>%Pipe operator %>>% basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.
Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.
On the right-hand side of %>>%, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.
More specifically, the Pipe operator %>>% by default inserts the object on the left-hand side to the first argument of the function on the right-hand side. In other words, x %>>% f(a=1) will be transformed to and be evaluated as f(.,a=1) where . takes the value of x. It accepts both function call, e.g. plot() or plot(col="red"), and function name, e.g. log or plot.
rnorm(100) %>>%
plot
rnorm(100) %>>%
plot(col="red")
Sometimes the value on the left is needed at multiple places. One can use . to represent it anywhere in the function call.
rnorm(100) %>>%
plot(col="red",main=sprintf("Number of points: %d",length(.)))
You can write commands in a chain (or pipeline) like
rnorm(10000,mean=10,sd=1) %>>%
sample(size=100,replace=FALSE) %>>%
log %>>%
diff %>>%
plot(col="red",type="l")
*Notice: function name in a namespace must end up with parentheses like x %>>% base::mean().
. in an expressionNot all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by {} or () so that %>>% will use . to represent the value on the left.
mtcars %>>%
{ lm(mpg ~ cyl + wt, data = .) }
mtcars %>>%
( lm(mpg ~ cyl + wt, data = .) )
rnorm(100) %>>%
{ plot(.) }
rnorm(100) %>>%
{ plot(., col="red") }
rnorm(100) %>>%
{ sample(., size=length(.)*0.5) }
mtcars %>>% {
lm(mpg ~ cyl + disp, data=.) %>>%
summary
}
rnorm(100) %>>% {
par(mfrow=c(1,2))
hist(.,main="hist")
plot(.,col="red",main=sprintf("%d",length(.)))
}
It can be confusing to see multiple . symbols in the same context. In some cases, they may represent different things in the same expression. Even though the expression mostly still works, it may not be a good idea to keep it in that way. Here is an example:
mtcars %>>%
(lm(mpg ~ ., data = .)) %>>%
summary
The code above works correctly even though the two dots in the second line have different meanings. . in formula mpg ~ . represents all variables other than mpg in data frame mtcars; . in data=. represents mtcars. One way to reduce ambiguity is to use lambda expression that names the piped object on the left of ~ and specifies the expression to evaluate on the right.
%>>% will assume lambda expression follows when the next expression is enclosed by parentheses (). The lambda expression can be in the following forms:
expr where . is by default used to represent the piped object.x ~ expr where expr will be evaluated with x representing the piped object.The previous example can be rewritten with lambda expression like this:
mtcars %>>%
(df ~ lm(mpg ~ ., data=df)) %>>%
summary
In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling plot() to draw scatter plot returns NULL, and if one directly calls plot() in the middle of a pipeline, it would break the pipeline by changing the subsequent input to NULL.
One-sided formula that starts with ~ indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.
mtcars %>>%
subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
(~ cat("rows:",nrow(.),"\n")) %>>% # cat() returns NULL
summary
mtcars %>>%
subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
(~ plot(mpg ~ wt, data = .)) %>>% # plot() returns NULL
(lm(mpg ~ wt, data = .)) %>>%
summary()
With ~, side-effect operations can be easily distinguished from mainstream pipeline.
An easier way to print the intermediate value it to use (? expr) syntax like asking question.
mtcars %>>%
(? ncol(.)) %>>%
summary
In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).
If one needs to assign the value to a symbol, just insert a step like (~ symbol), then the input value of that step will be assigned to symbol in the current environment.
mtcars %>>%
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary
If the input value is not directly to be saved but after some transformation, then one can use = to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).
mtcars %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary
An easier way to saving intermediate value that is to be further piped is to use (symbol = expression) syntax.
mtcars %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
(lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>% # continue piping
summary
x %>>% (y) means extracting the element named y from object x where y must be a valid symbol name and x can be a vector, list, environment or anything else for which [[]] is defined, or S4 object.
mtcars %>>%
(lm(mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary %>>%
(r.squared)
To evaluate an expression within the piped object if it is a list or environment, use with() can be helpful.
list(a = 1, b = 2) %>>%
with(a+2*b)
But this method does not work for vector and S4 object.
Pipe() creates a Pipe object where built-in symbols are designed for building pipeline.
For more details, view the vignette written for Pipe.