knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

library(rlang)
fail <- function() "\u274c"
pass <- function() "\u2705"

There are many different ways that magrittr could implement the pipe. The goal of this document is to elucidate the variations, and the various pros and cons of each approach. This document is primarily aimed at the magrittr developers (so we don't forget about important considerations), but will be of interest to anyone who wants to understand pipes better, or to create their own pipe that makes different tradeoffs

Code transformation

There are three main options for how we might transform a pipeline in base R expressions. Here they are illustrated with x %>% foo() %>% bar():

(There is a fourth option, which uses eager evaluation, but uses a unique variable name for each stage. This has no advantages compared to the eager pipe so we will not consider it further.)

We'll first explore the desired properties we might want a pipe to possess and then see how each of the three variants does.

Desired properties

These are the properties that we might want a pipe to possess, roughly ordered from most important to leasy important.

Nested pipe

Eager pipe

Lazy pipe

`%<~%` <- function(name, value, env = caller_env()) {
  name <- ensym(name)
  value <- enexpr(value)

  env_bind_exprs(env, .eval_env = env, !!name := !!value)
}

Execution environment

Once the pipe has been transformed to a regular R expression, it must be evaluated. There are three options for where that evaluatioon could take place:

This choice affects impacts functions that work with the current environment (like assign(), get(), ls()), or the current context (like return()). The following two functions illustrate the primary differences:

f <- function() {
  x <- 20
  10 %>% assign("x", .)
  x
}

g <- function() {
  10 %>% return()
  return(20)
}

To discuss implementation challenges with a concrete example, we'll take the following values:

double <- function(x) x * 2
increment <- function(x) x + 1
x <- 1:10

And implement the following simple pipe:

x %>% double() %>% increment() %>% double()

Using the eager transformation:

pipe <- expr({
  . <- double(.)
  . <- increment(.)
  double(.)
})

Note that we assume the input to the pipe is called ., not x. This is a small simplification that makes implementation of the transformer a little easier.

Closure environment

To evaluate the pipe in a closure environment, we first create a function, using the pipe fragment as body and a single argument (.):

pipe_fun <- new_function(exprs(. = ), pipe)
pipe_fun

And then we call it with x:

pipe_fun(x)

Evaluating the pipe in this way makes it clear that building functions with the pipe is the general case, and providing an initial value is the special case. In other words:

x %>% double() %>% increment() %>% double()

# is shorthand for

(. %>% double() %>% increment() %>% double())(x)
f_closure <- function() {
  x <- 20
  (function(.) {
    assign("x", .)
  })(10)
  x
}
f_closure()

g_closure <- function() {
  (function(.) {
    return(.)
  })(10)
  return(20)
}
g_closure()

New environment

eval_bare(pipe, env = env(. = x))
f_new <- function() {
  x <- 20
  eval_bare(expr(assign("x", .)), env(. = 10))
  x
}
f_new()

g_new <- function() {
  eval_bare(expr(return(.)), env(. = 10))
  return(20)
}
g_new()

Current environment

At first glance, evaluating the pipe in the current environment is quite simple:

. <- x
eval_bare(pipe)
rm(.)

And that leads to:

f_current <- function() {
  x <- 20

  . <- 10
  eval_bare(expr(assign("x", .)))
  rm(.)

  x
}
f_current()

g_current <- function() {
  . <- 10
  eval_bare(expr(return(.)))
  rm(.)

  return(20)
}
g_current()

And this could be wrapped into a simple function so that we can ensure . is unbound even when an error occurs:

pipe_eval <- function(pipe, init, env = caller_env()) {
  env_bind(.env = env, . = init)
  on.exit(env_unbind(env, "."))

  eval_bare(pipe, env)
}

pipe_eval(pipe, x)

(This implementation will clobber any existing . but a more sophisticated implementation could restore any existing value on exit. Similarly, the cleanup after the lazy transformation would be more work (since it creates multiple variables), but it's not prohibitively hard.)

The main drawback to this approach is that eval_bare() currently loses the visibility flag. This can be fixed, but needs work in C.



msmclear/magrittr documentation built on Dec. 31, 2019, 12:56 a.m.