README.md

Travis build
status Codecov test
coverage

fastpipe

DISCLAIMER : There was a big oversight and I'm a bit embarrassed about it, nested pipe calls such as cars %>% summarize_all(. %>% mean) fail. I'll see if I can fix this but meanwhile I must admit this is makes the package much less robust that I'd like it to be! %>>% works fine, nested or not. The moral of the story is there's a reason why one wants to avoid playing with global variables.

This package proposes an alternative to the pipe from the magrittr package.

It’s named the same and passes all the magrittr test so can easily be a drop-in replacement, its main advantages is that it is faster and solves most of the issues of magrittr.

Install with :

remotes::install_github("moodymudskipper/fastpipe")

Issues solved and special features

fastpipe passes all magrittr’s tests, but it doesn’t stop there and solves most of its issues, at least most of the open issues on GitHub.

# https://github.com/tidyverse/magrittr/issues/195
# Issue with lazy evaluation #195
gen <- function(x) {function() eval(quote(x))}
identical(
  {fn <- gen(1); fn()},
  {fn <- 1 %>% gen(); fn()})
#> [1] TRUE
# https://github.com/tidyverse/magrittr/issues/159
# Pipes of functions have broken environments
identical(
  {
    compose <- function(f, g) { function(x) g(f(x)) }
    plus1   <- function(x) x + 1
    compose(plus1, plus1)(5)},
  {
    plus2 <- plus1 %>% compose(plus1)
    plus2(5)
})
#> [1] TRUE
# Note : copy and pasted because didn't work with knitr

This was requested by Lionel and seems fairly reasonable as it’s is unlikely that a user will need to use both . and !!!. in a call.

letters[1:3] %>% rlang::list2(!!!.)
#> [[1]]
#> [1] "a"
#> 
#> [[2]]
#> [1] "b"
#> 
#> [[3]]
#> [1] "c"
iris %>% base::dim
#> [1] 150   5
iris %>% base:::dim
#> [1] 150   5
x <- list(y = dim)
iris %>% x$y
#> [1] 150   5
iris %>% head %<>% dim
#> Error in iris %>% head %<>% dim: A compound pipe should only be used at the start of the chain
. %<>% head
#> Error in . %<>% head: You can't start a functional sequence on a compound operator
c(a = 1, b = 2) %S>% data.frame(!!!.)
#>   a b
#> 1 1 2
1000000 %L>% rnorm() %L>% sapply(cos) %>% max
#> rnorm(.)   ...
#> ~  0.26 sec
#> sapply(., cos)   ...
#> ~  2.11 sec
#> [1] 1

Families of pipes and performance

We provide a benchmark below, the last value is given for context, to remember that we are discussing small time intervals.

`%.%` <- fastpipe::`%>%` # (Note that a *fastpipe*, unlike a *magrittr* pipe, can be copied)
`%>%` <- magrittr::`%>%`
bench::mark(check=F,
  'magrittr::`%>%`' =
    1 %>% identity %>% identity() %>% (identity) %>% {identity(.)},
  'fastpipe::`%>%`' =
    1 %.% identity %.% identity() %.% (identity) %.% {identity(.)},
  'fastpipe::`%>>%`' =
    1 %>>% identity(.) %>>% identity(.) %>>% identity(.) %>>% identity(.),
  base = identity(identity(identity(identity(1)))),
  `median(1:3)` = median(1:3)
)
#> # A tibble: 5 x 6
#>   expression            min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 magrittr::`%>%`   113.5us  250.8us     3104.   88.57KB     6.73
#> 2 fastpipe::`%>%`    44.8us   90.3us     9188.        0B     4.23
#> 3 fastpipe::`%>>%`    8.1us     19us    42288.    4.45KB     8.46
#> 4 base                1.3us    3.1us   297595.        0B    29.8 
#> 5 median(1:3)        24.9us   54.4us    14621.   26.61KB     4.32
rm(`%>%`) # reseting `%>%` to fastpipe::`%>%`

We see that our %>% pipe is twice faster, and that our %>>% pipe is 10 times faster (on a properly formatted call).

We’d like to stress that a pipe is unlikely to be the performance bottleneck in a script or a function. If optimal performance is critical, pipes are best avoided as they will always have an overhead.

In other cases %>>% is fast and robust to program and keep the main benefit of piping.

Implementation

The implementation is completely different from magrittr, we can sum it up as follow :

`%>>%`
#> # a fastpipe object
#> function (lhs, rhs) 
#> {
#>     rhs_call <- substitute(rhs)
#>     eval(rhs_call, envir = list(. = lhs), enclos = parent.frame())
#> }
#> <bytecode: 0x000000001abe5960>
#> <environment: namespace:fastpipe>

To support the two latter, we use global variables, stored in globals, a child environment of our package’s environment. It contains the values master, is_fs and is_compound. Outside of pipes master is always TRUE while the two latter are always FALSE. The alternative to global parameters was to play with classes and attributes but was slower and more complex.

Whenever we enter a pipe we check the value of globals$master. If it is TRUE we enter a sequence where :

fastpipe operators contain their own code while in magrittr’s current implementation they all have the main code and are recognized in the function by their name. Moreover the pipe names are not hardcoded in the functions, which prevents confusion like the fact that %$% still sometimes work in magrittr when only %>% is reexported. https://github.com/tidyverse/magrittr/issues/194

The pipes have a class and a printing methods, the functional sequences don’t.

`%T>>%`
#> # a fastpipe object
#> function (lhs, rhs) 
#> {
#>     rhs_call <- substitute(rhs)
#>     {
#>         eval(rhs_call, envir = list(. = lhs), enclos = parent.frame())
#>         lhs
#>     }
#> }
#> <bytecode: 0x000000001524e888>
#> <environment: namespace:fastpipe>
. %>% sin %>% cos() %T>% tan(.)
#> function (.) 
#> . %>>% sin(.) %>>% cos(.) %T>>% tan(.)

Benchmarks for functional sequences

An interesting and little known fact is that using functional sequences in functionals is much more efficient than using lambda functions calling pipes (though it will generally be largely offset by the content of the fonction), for instance : purrr::map(foo, ~ .x %>% bar %>% baz) is slower than purrr::map(foo, . %>% bar %>% baz). I tried to keep this nice feature but didn’t succeed to make it as efficient as in magrittr. The difference show only for large number of iteration and will probably be negligible in any realistic loop.

`%.%` <- fastpipe::`%>%`
`%>%` <- magrittr::`%>%`
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# functional sequence with 5 iterations, equivalent speed
library(rlang) # to call `as_function()`
bench::mark(check=F,
  rlang_lambda =
  vapply(1:5, as_function(~.x %>% identity %>% identity() %>% (identity) %>% {identity(.)}), integer(1)),
  fastpipe =
  vapply(1:5, . %.% identity %.% identity() %.% (identity) %.% {identity(.)}, integer(1)),
  magrittr =
  vapply(1:5, . %>% identity %>% identity() %>% (identity) %>% {identity(.)}, integer(1)),
  base = 
  vapply(1:5, function(x) identity(identity(identity(identity(x)))) , integer(1)),
  `median(1:3)` = median(1:3)
)
#> # A tibble: 5 x 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rlang_lambda  718.9us   1.47ms      589.    63.2KB     6.85
#> 2 fastpipe      120.1us  226.1us     3352.    32.1KB     6.77
#> 3 magrittr      153.9us    370us     2479.    13.8KB     6.56
#> 4 base           12.1us   28.9us    28612.    10.1KB     5.72
#> 5 median(1:3)      28us   58.7us    13592.        0B     4.33

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# functional sequence with 1000 iterations
bench::mark(check=F,
  rlang_lambda =
  vapply(1:1000, as_function(~.x %>% identity %>% identity() %>% (identity) %>% {identity(.)}), integer(1)),
  fastpipe =
  vapply(1:1000, . %.% identity %.% identity() %.% (identity) %.% {identity(.)}, integer(1)),
  magrittr =
  vapply(1:1000, . %>% identity %>% identity() %>% (identity) %>% {identity(.)}, integer(1)),
  base = 
  vapply(1:1000, function(x) identity(identity(identity(identity(x)))) , integer(1))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 4 x 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rlang_lambda 322.91ms 358.17ms      2.79  277.39KB     5.58
#> 2 fastpipe      19.64ms  42.77ms     23.2     3.95KB     5.79
#> 3 magrittr       7.96ms  16.16ms     57.8     4.23KB     5.98
#> 4 base              2ms   3.64ms    202.     14.09KB     7.94

Breaking changes

The fact that the tests of magrittr are passed by fastpipe doesn’t mean that fastpipe will necessarily behave as expected by magrittr’s users in all cicumstances.

Here are the main such cases :

Notes

Attaching package: ‘dplyr’

The following object is masked _by_ ‘package:fastpipe’:

%>%

The current list of the packages subjects to this hook is : magrittr, dplyr, purrr, tidyr, stringr, forcats, rvest, modelr, testthat. However many more packages reexport magrittr’s pipe and the safest way to make sure you’re using fastpipe is to attach it after all other packages.



moodymudskipper/fastpipe documentation built on Dec. 12, 2019, 6:29 a.m.