knitr::opts_chunk$set(error=TRUE, comment=NA) library(oshka) knitr::read_chunk('../tests/helper/ersatz.R')
We will implement simplified versions of dplyr
and data.table
to illustrate
how to write programmable NSE functions with oshka
. The implementations are
intentionally limited in functionality, robustness, and speed for the sake of
simplicity.
dplyr
The interface is as follows:
group_r <- function(x, ...) {...} # similar to dplyr::group_by filter_r <- function(x, subset) {...} # similar to dplyr::filter summarize_r <- function(x, ...) {...} # similar to dplyr::summarise `%$%` <- function(x, y) {...} # similar to the magrittr pipe
<<summarize_r>> <<summarize_r_l>> <<fo_dplyr_extra>>
Our functions mimic the corresponding dplyr
ones:
CO2 %$% # built-in dataset filter_r(grepl("[12]", Plant)) %$% group_r(Type, Treatment) %$% summarize_r(mean(conc), mean(uptake))
Most of the implementation is not directly related to oshka
NSE, but we will
go over summarize_r
to highlight how those parts integrate with the rest.
summarize_r
is just a forwarding function:
<<summarize_r>>
We use the eval
/bquote
pattern to forward NSE
arguments. We
retrieve summarize_r_l
from the current function frame with .()
, because
there is no guarantee we would find it on the search path starting from the
parent frame. In this case it happens to be available, but it would not be if
these functions were in a package.
We present summarize_r_l
in full for reference, but feel free to skip as we
highlight the interesting bits next:
<<summarize_r_l>>
The only oshka
specific line is the second one:
exps.sub <- expand(substitute(els), x, frm)
els
is the language captured and forwarded by summarize_r
. We run expand
on that language with our data x
as the environment and the parent frame as
the enclosure. We then compute the groups:
grps <- make_grps(x) # see appendix splits <- lapply(grps, eval, x, frm)
make_grps
extracts the grouping expressions generating by group_r
. These
have already been substituted so we evaluate each one with x
as the
environment and the parent frame as the enclosure. We use this to split our
data into groups:
dat.split <- split(x, splits, drop=TRUE)
Finally we can evaluate our expand
ed expressions within each of the groups:
# aggregate res.list <- lapply( dot_list(exps.sub), # see appendix function(exp) lapply(dat.split, eval, expr=exp, enclos=frm) ) list_to_df(res.list, grp.split, splits) # see appendix
dot.list
turns exps.sub
into a list of expressions. Each expression is then
evaluated with each data chunk as the environment and the parent frame as the
enclosure. Finally list_to_df
turns our lists of vectors into a data frame.
You can see the rest of the implementation in the appendix.
That single expand
line enables a programmable NSE:
f.exp <- quote(grepl("[12]", Plant)) s.exp <- quote(mean(uptake)) CO2 %$% filter_r(f.exp & conc > 500) %$% group_r(Type, Treatment) %$% summarize_r(round(s.exp))
Because %$%
uses expand
you can even do the following:
f.exp.b <- quote(filter_r(grepl("[12]", Plant) & conc > 500)) g.exp.b <- quote(group_r(Type, Treatment)) s.exp.b <- quote(summarize_r(mean(conc), mean(uptake))) exp <- quote(f.exp.b %$% g.exp.b %$% s.exp.b) CO2 %$% exp
data.table
We wish to re-use our ersatz dplyr
functions to create a data.table
-like
interface:
<<super_df>>
Again, we use the eval
/bquote
pattern to forward the NSE arguments to our
NSE functions filter_r
, group_r_l
, and summarize_r_l
. The pattern
is not trivial, but it only took six lines of code to transmogrify our
faux-dplyr
into a faux-data.table
.
After we add the super_df
class to our data we can start using it with
data.table
semantics, but with programmable NSE:
co2 <- as.super_df(CO2) co2[f.exp, s.exp, by=Type] exp.a <- quote(max(conc)) exp.b <- quote(min(conc)) co2[f.exp, list(exp.a, exp.b), by=list(Type, Treatment)][1:3,] exp.c <- quote(list(exp.a, exp.b)) exp.d <- quote(list(Type, Treatment)) co2[f.exp, exp.c, by=exp.d][1:3,]
Despite the forwarding layers the symbols resolve as expected in complex circumstances:
exps <- quote(list(stop("boo"), stop("ya"))) # don't use this g.exp <- quote(Whatever) # nor this local({ summarize_r_l <- function(x, y) stop("boom") # nor this max.upt <- quote(max(uptake)) # use this min.upt <- quote(min(uptake)) # and this exps <- list(max.upt, min.upt) g.exp <- quote(Treatment) # and this lapply(exps, function(y) co2[f.exp, y, by=g.exp]) })
And we can even nest our dplyr
and data.table
for an unholy abomination:
exp <- quote(data.frame(upt=uptake) %$% summarize_r(new.upt=upt * 1.2)) local({ exps <- list(quote(sum(exp$new.upt)), quote(sum(uptake))) g.exp <- quote(Treatment) lapply(exps, function(y) co2[f.exp, y, by=g.exp]) })
Ersatz dplyr
implementation:
## - Summarize ----------------------------------------------------------------- <<summarize_r>> <<summarize_r_l>> <<fo_dplyr_extra>>
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.