vadr
packageThe vadr
package includes a facility called quasiquotation. This is a technique that helps you build expressions and other data structures by template substitution. If you have found metaprogramming or ("programming on the language") to be difficult in R, qq
might help.
Base R has a very limited implementation of quasiquotation in the function bquote
. The functions included in the vadr
package are much more flexible, and are faster as well.
Quasiquotation is a language feature that descends from Lisp, and any language that claims to descend from the Lisp family ought to have a working implementation. For further examples and some history, see Alan Bawden's essay Quasiquotation in Lisp.
The expression
function is used to produce mathematical formatting and symbol characters in plots, as in the x-axis tick labels here:
library(knitr) opts_chunk$set(cache=TRUE, dev="svg", fig.width=5, fig.height=3, tidy=FALSE) options(width=60)
par(las = 1, cex = 1, mai=c(1, 0, 0, 0)) curve(sin(x)/x, -3*pi, 3*pi, axes = FALSE, ylab="") par(cex=0.7) axis(1, at=seq(-5, 5, 2)*pi/2, labels=expression(over(-5, 2)*pi, over(-3,2)*pi, over(-1, 2)*pi, over(3,2)*pi, over(3,2)*pi, over(5,2)*pi), padj=0.5)
expression
is useful for one-shot annotations but is hard to generalize. It was kind of laborious to type in all those tick labels, and I might have made a typo. If you have a lot of trigonometric functions to plot you'll quickly tire of labeling ticks this way. What if we had a function to generate the expressions?
We might naively try something like the below.
numerator <- seq(-5, 5, 2) denominator <- 2 symbol <- quote(pi) at <- numerator * pi / denominator labels <- mapply(numerator, FUN = function(num) { expression(over(num, denominator) * symbol) }) labels
Unfortunately doesn't work because "expression" doesn't evaluate its operand. We got a bunch of this:
par(mai=c(1,0,0,0)) plot(c(-3*pi,3*pi), c(0,1), type="n", axes=FALSE, xlab="", ylab="") axis(1, at=at, labels=labels, padj=0.5)
The thing is that expression
will quote an expression as we wrote it literally, but what we need is to construct an expression out of bits of data we have. Facing this difficulty, many S users resort to something rebarbative using parse
and parse
(I'll omit the sapply
loop for clarity):
numerator <- 5 parse(text=paste0("over(", numerator , ", ", denominator, ") * ", symbol))
This works for simple cases but there are some obvious problems with this approach. First, because the code is manipulated in strings, it's possible to create syntax errors (on top of just semantic errors.) If you are substituting strings into your code you have to worry about whether the strings themselves might contain quotes and escape them properly. Mainly, it quickly becomes hard to read. In examples much larger than this, parentheses and commas belonging to two logically distinct sets of code interleave themselves together via quotation marks, making it hard to see if either expression is well formed.
A more principled way to build expressions is to use "computing on the language" to build expression objects out of data, like the following.
as.expression(call("*", call("over", numerator, denominator), symbol))
This uses R's ability to represent code as data and vice versa, which is one of the strengths of the language. An expression like a*b
is really just a special type of list;
quote( a * b )
is really the same thing as as.call( list(quote(*), quote(a), quote(b)) )
. (Further, quote(a)
is the same thing as as.name("a")
.)
This avoids a round trip through the parser and the possibilty of forming unparseable strings, but I can't say that this version is any easier to read than the last. Much of the resemblance between the expressions you want and the code you are writing is lost.
You can get a little closer using substitute
:
as.expression(substitute( over(numerator, denominator) * symbol, list(num=numerator, denominator=denominator, symbol=symbol)))
The first argument to substitute
does mirror the desired results better but I often find constructing the second argument to be tedious -- here it's more than half of the code and doesn't say much. I also dislike the fact that you can't tell what is getting substituted out while looking at the expression, only by referencing it to the named list. It's not apparent that num
is being substituted while over
is not, unless you cross-reference the expression against the list. (And if you don't provide the list, substitute
may well pick up some unintended over
that you have lying around.)
The biggest problem is that substitute
is limited to only substituting for symbols, when there are more things in R's syntax that are beyond its reach. For example, substitute
doesn't touch function argument lists or argument names. This doesn't doesn't do what it looks like it wants to do:
substitute( function(argname=default) list( output_name=argname ), alist(argname=X, default=Y, output_name=Z))
What we'd like is a way to build expressions of the form over(numerator, denominator) * symbol
whose structure mirrors the desired result. That's what quasiquotation tries to do.
library(vadr) as.expression(qq( over(.(numerator), .(denominator)) * .(symbol) ))
Here, qq
acts like quote,
doing nothing to its argument, except for sections delineated with .()
, which are evaluated and their results pasted in. This comes close to the ideal where the structure of the desired output is mirrored by the code that produces it.
The places where code is being substituted in are visually delineated with the .()
without having to make reference to a parallel list of expressions.
bquote
The above could be have been done using the base R function bquote
, but qq
has some extensions that make it much more flexible. Most importantly it implements splicing when you use a marker with two dots, ..()
.
Splicing injects several arguments into an expression marked with ..()
, where .()
injects just one. You might think of it as the difference between c
and list
(which is actually what it compiles down to.)
x <- c(1, 2, 3) qq(list("normal:", .(x), "splicing:", ..(x)))
This is something you can't do with bquote
(while substitute
only does splicing in a special case requiring you to get your data into a special, user-invisible DOTSXP
object that can only live in the variable named ...
).
The other place where qq
and friends improve over bquote
is in the treatment of names. In R, unlike most Lisps, every argument to a call has a (possibly blank) name attached. So a quasiquotation overator that fits R will have to do something about names. With qq
, any argument name, string or variable name that looks like ".()"
will be parsed and have the evaluated results substituted in.
So, if you need to create a function that has a particular argument name:
make_function_of <- function(argname) { qe( function( `.(argname)` ) { `.(argname)` + 12 }) } make_function_of("X")
Note the backticks: since the expression is appearing in a place where R's parser only allows a symbol name, we have to disguise the expression as a symbol name.
qq
, qe
, qqply
and qeply
Note that qq
returns a quoted expression while qe
goes ahead and evaluates it. This convenience is more necessary in R than in Lisp because R distinguishes between language objects and data objects. So if you wanted to use substitution to make a call to "list", use qq
, but use qe
to make the list itself.
qq( c(`.(letters[17:26])` = ..(LETTERS[1:10])) ) qe( c(`.(letters[17:26])` = ..(LETTERS[1:10])) )
Quasiquotation is often described is as as a tool for writing code that writes code (or "metaprogramming.") But as the last example shows, it's also useful for building structured data. In a way quasiquotation is like an inverse of destructuring bind (which vadr
also provides, in the bind
operation.) While destructuring bind extracts data from a structure by writing an code that mirrors the structure, quasiquotation lets you populate data into a structure by writing code that mirrors that structure.
Finally, there are two functions that combine quasiquotation with mapply
, called qqply
and qeply
, which are convenient when you want to build expressions that contain parallel elements. For example, we can build our whole list of plot labels:
numerator <- seq(-5, 5, 2) as.expression( qqply( over( .(num), .(denominator) ) * .(symbol) )(num=numerator) )
mkchain
and macroschain
is a tool in the vadr
package that builds a function by "chaining" the output of one call into the input of the next. (chain
is like the ->
macro in Clojure and the |>
operator in Elixir.)
For example, say you wanted to know which baseball players have played for four or more teams in a given season, ordered most recently to last . You could do this by a combination of ddply
and then subset
then arrange
:
library(plyr) stint_counts <- ddply(baseball, c("id", "year"), summarize, stints=length(unique(team))) many_stints <- subset(stint_counts, stints >= 4) arrange(many_stints, desc(year))
I often find I'm creating a value just to send it off into another function, so I like having a way to say "ddply
, then subset
, then arrange
," leaving out the name of the variable I would otherwise carrying around, like this:
chain(baseball, ddply(c("id", "year"), summarize, stints=length(unique(team))), subset(stints >= 4), arrange(desc(year)))
Implementing chain
is a good example of code that writes code. Here we want chain(data, foo, bar(5), baz(6,X))
to have the same effect as defining a function this way:
(function(X) { X <- foo(X) X <- bar(X, 5) baz(X, 6) })(data)
That is, I want to treat X
like a null subject, taken to be implicit if it is not given explicitly. Here's a simplified implementation of mkchain
:
chain <- macro(function(data, ...) { qq( (function(X) { ..( qqply( X <- .(chain_step(arg)) )(arg=list(...)) ) X })(.(data)) ) })
The outer call to qq
gives the skeleton of the function, which is an immediately evaluated function expression binding the data to X. The inner call to qqply
puts together the assignment statements. It calls out to chain_step
which looks at each step and injects an X
in if it is needed:
chain_step <- function(expr) { if (is.call(expr)) { if ("X" %in% all.names(expr)) { #"call(Y, X)" means itself expr } else { #"call(Y)" means "call(X,Y)" qq( .(expr[[1]])(X, ...(as.list(expr[-1]))) ) } } else { #`symbol` means "symbol(X)" qq( .(expr)(X) ) } }
When called, chain
constructs a whole new function:
expand_macros_q( chain(data, foo, bar(5), baz(6, X), sum) )
Note that the definition of chain
is wrapped in macro
. macro
does the work of substituting
to get the expression form of its arguments, and evaluating the resulting expression in the caller; this work is common to many nonstandard evaluation functions. What's more, macro
remembers (or memoizes) each expression it transforms, so even a complicated lexical analysis can be cached and evaluated quickly the second time around. This does mean that macro
functions cannot evaluate their arguments or look in their dynamic environment to decide what to do (but this is not as big an obstacle as you may think; you can always return an immediately-invoked function expression to complete what needs to happen at runtime.)
qq
is fastDespite having much added functionality over bquote
, qq
is substantially faster over repeated calls. This is because qq
itself operates as a macro. The first time you call qq
, it performs the lexical analysis of finding all the .()
calls, parsing them and determining how to splice them in, but instead of directly substituting, qq
compiles a new function just for that particular substitution. When code comes to the same call to bquote
again, that saved function is retrieved and executed quickly.
If you are curious to see what the compiled function looks like, vadr:::qq_internal
is the hook into qq
's compiler.
expr <- vadr:::qq_internal( quote( expression(over(.(num), .(denominator)) * .(symbol) ) ) ) print(expr) num <- 5 print(eval(expr))
(Note that the compiled code uses literal values rather than calls to "list" etc. whenever it can. So it may not produce code that looks like it works. Here what prints out as list(over)
is actually a literal list containing the symbol over
, rather than a call to list
with the argument over
; the former is not evaluated while the latter is. Unfortunately R's pretty-printer does not distinguish these cases.)
Because it can remember and skip over the lexical analysis it did the first time, qq
is often faster than bquote
.
library(microbenchmark) x <- 3; y <- quote({x;y;z}) microbenchmark( times=250, bquote = bquote(function(a, b=.(x)) {foo; bar; .(y)}), qq = qq(function(a, b=.(x)) {foo; bar; .(y)}))
The compilation of the unquoter does take a bit of time on the first run-through (much of which is taken up by regexp matching looking for ".()"
names), but the compiled form is faster than bquote
. (Further optimizations of the compiled form are possible, too.)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.