Quasiquotation in the vadr package

The vadr package includes a facility called quasiquotation. This is a technique that helps you build expressions and other data structures by template substitution. If you have found metaprogramming or ("programming on the language") to be difficult in R, qq might help.

Base R has a very limited implementation of quasiquotation in the function bquote. The functions included in the vadr package are much more flexible, and are faster as well.

Quasiquotation is a language feature that descends from Lisp, and any language that claims to descend from the Lisp family ought to have a working implementation. For further examples and some history, see Alan Bawden's essay Quasiquotation in Lisp.

Motivating quasiquotation

Generating sophisticated plot labels

The expression function is used to produce mathematical formatting and symbol characters in plots, as in the x-axis tick labels here:

library(knitr)
opts_chunk$set(cache=TRUE, dev="svg", fig.width=5, fig.height=3, tidy=FALSE)
options(width=60)
par(las = 1, cex = 1, mai=c(1, 0, 0, 0))
curve(sin(x)/x, -3*pi, 3*pi, axes = FALSE, ylab="")
par(cex=0.7)
axis(1, at=seq(-5, 5, 2)*pi/2,
     labels=expression(over(-5, 2)*pi, over(-3,2)*pi,
                       over(-1, 2)*pi, over(3,2)*pi,
                       over(3,2)*pi, over(5,2)*pi),
     padj=0.5)

expression is useful for one-shot annotations but is hard to generalize. It was kind of laborious to type in all those tick labels, and I might have made a typo. If you have a lot of trigonometric functions to plot you'll quickly tire of labeling ticks this way. What if we had a function to generate the expressions?

We might naively try something like the below.

numerator <- seq(-5, 5, 2)
denominator <- 2
symbol <- quote(pi)
at <- numerator * pi / denominator
labels <- mapply(numerator, FUN = function(num) {
  expression(over(num, denominator) * symbol)
})
labels

Unfortunately doesn't work because "expression" doesn't evaluate its operand. We got a bunch of this:

par(mai=c(1,0,0,0))
plot(c(-3*pi,3*pi), c(0,1), type="n", axes=FALSE, xlab="", ylab="")
axis(1, at=at, labels=labels, padj=0.5)

The thing is that expression will quote an expression as we wrote it literally, but what we need is to construct an expression out of bits of data we have. Facing this difficulty, many S users resort to something rebarbative using parse and parse (I'll omit the sapply loop for clarity):

numerator <- 5
parse(text=paste0("over(", numerator , ", ", denominator, ") * ", symbol))

This works for simple cases but there are some obvious problems with this approach. First, because the code is manipulated in strings, it's possible to create syntax errors (on top of just semantic errors.) If you are substituting strings into your code you have to worry about whether the strings themselves might contain quotes and escape them properly. Mainly, it quickly becomes hard to read. In examples much larger than this, parentheses and commas belonging to two logically distinct sets of code interleave themselves together via quotation marks, making it hard to see if either expression is well formed.

A more principled way to build expressions is to use "computing on the language" to build expression objects out of data, like the following.

as.expression(call("*", call("over", numerator, denominator), symbol))

This uses R's ability to represent code as data and vice versa, which is one of the strengths of the language. An expression like a*b is really just a special type of list; quote( a * b ) is really the same thing as as.call( list(quote(*), quote(a), quote(b)) ). (Further, quote(a) is the same thing as as.name("a").)

This avoids a round trip through the parser and the possibilty of forming unparseable strings, but I can't say that this version is any easier to read than the last. Much of the resemblance between the expressions you want and the code you are writing is lost.

You can get a little closer using substitute:

as.expression(substitute(
    over(numerator, denominator) * symbol,
    list(num=numerator, denominator=denominator, symbol=symbol)))

The first argument to substitute does mirror the desired results better but I often find constructing the second argument to be tedious -- here it's more than half of the code and doesn't say much. I also dislike the fact that you can't tell what is getting substituted out while looking at the expression, only by referencing it to the named list. It's not apparent that num is being substituted while over is not, unless you cross-reference the expression against the list. (And if you don't provide the list, substitute may well pick up some unintended over that you have lying around.)

The biggest problem is that substitute is limited to only substituting for symbols, when there are more things in R's syntax that are beyond its reach. For example, substitute doesn't touch function argument lists or argument names. This doesn't doesn't do what it looks like it wants to do:

substitute( function(argname=default) list( output_name=argname ),
            alist(argname=X, default=Y, output_name=Z))

Building expressions with quasiquote

What we'd like is a way to build expressions of the form over(numerator, denominator) * symbol whose structure mirrors the desired result. That's what quasiquotation tries to do.

library(vadr)
as.expression(qq(
  over(.(numerator), .(denominator)) * .(symbol)  ))

Here, qq acts like quote, doing nothing to its argument, except for sections delineated with .(), which are evaluated and their results pasted in. This comes close to the ideal where the structure of the desired output is mirrored by the code that produces it.

The places where code is being substituted in are visually delineated with the .() without having to make reference to a parallel list of expressions.

Quasiquote improves over bquote

The above could be have been done using the base R function bquote, but qq has some extensions that make it much more flexible. Most importantly it implements splicing when you use a marker with two dots, ..().

Splicing

Splicing injects several arguments into an expression marked with ..(), where .() injects just one. You might think of it as the difference between c and list (which is actually what it compiles down to.)

x <- c(1, 2, 3)
qq(list("normal:", .(x), "splicing:", ..(x)))

This is something you can't do with bquote (while substitute only does splicing in a special case requiring you to get your data into a special, user-invisible DOTSXP object that can only live in the variable named ...).

Computed names

The other place where qq and friends improve over bquote is in the treatment of names. In R, unlike most Lisps, every argument to a call has a (possibly blank) name attached. So a quasiquotation overator that fits R will have to do something about names. With qq, any argument name, string or variable name that looks like ".()" will be parsed and have the evaluated results substituted in.

So, if you need to create a function that has a particular argument name:

make_function_of <- function(argname) {
  qe( function( `.(argname)` ) {
    `.(argname)` + 12
  })
}

make_function_of("X")

Note the backticks: since the expression is appearing in a place where R's parser only allows a symbol name, we have to disguise the expression as a symbol name.

qq, qe, qqply and qeply

Note that qq returns a quoted expression while qe goes ahead and evaluates it. This convenience is more necessary in R than in Lisp because R distinguishes between language objects and data objects. So if you wanted to use substitution to make a call to "list", use qq, but use qe to make the list itself.

qq( c(`.(letters[17:26])` = ..(LETTERS[1:10])) )
qe( c(`.(letters[17:26])` = ..(LETTERS[1:10])) )

Quasiquotation is often described is as as a tool for writing code that writes code (or "metaprogramming.") But as the last example shows, it's also useful for building structured data. In a way quasiquotation is like an inverse of destructuring bind (which vadr also provides, in the bind operation.) While destructuring bind extracts data from a structure by writing an code that mirrors the structure, quasiquotation lets you populate data into a structure by writing code that mirrors that structure.

Finally, there are two functions that combine quasiquotation with mapply, called qqply and qeply, which are convenient when you want to build expressions that contain parallel elements. For example, we can build our whole list of plot labels:

numerator <- seq(-5, 5, 2)
as.expression(
    qqply(
        over( .(num), .(denominator) ) * .(symbol)
    )(num=numerator)
)

Advanced Example: mkchain and macros

chain is a tool in the vadr package that builds a function by "chaining" the output of one call into the input of the next. (chain is like the -> macro in Clojure and the |> operator in Elixir.)

For example, say you wanted to know which baseball players have played for four or more teams in a given season, ordered most recently to last . You could do this by a combination of ddply and then subset then arrange:

library(plyr)
stint_counts <- ddply(baseball, c("id", "year"), summarize,
                      stints=length(unique(team)))
many_stints <- subset(stint_counts, stints >= 4)
arrange(many_stints, desc(year))

I often find I'm creating a value just to send it off into another function, so I like having a way to say "ddply, then subset, then arrange," leaving out the name of the variable I would otherwise carrying around, like this:

chain(baseball,
      ddply(c("id", "year"),
            summarize, stints=length(unique(team))),
      subset(stints >= 4),
      arrange(desc(year)))

Implementing chain is a good example of code that writes code. Here we want chain(data, foo, bar(5), baz(6,X)) to have the same effect as defining a function this way:

(function(X) {
  X <- foo(X)
  X <- bar(X, 5)
  baz(X, 6)
})(data)

That is, I want to treat X like a null subject, taken to be implicit if it is not given explicitly. Here's a simplified implementation of mkchain:

chain <- macro(function(data, ...) {
  qq(
      (function(X) {
        ..( qqply( X <- .(chain_step(arg)) )(arg=list(...)) )
        X
      })(.(data))
  )
})

The outer call to qq gives the skeleton of the function, which is an immediately evaluated function expression binding the data to X. The inner call to qqply puts together the assignment statements. It calls out to chain_step which looks at each step and injects an X in if it is needed:

chain_step <- function(expr) {
  if (is.call(expr)) {
    if ("X" %in% all.names(expr)) {
      #"call(Y, X)" means itself
      expr
    } else {
      #"call(Y)" means "call(X,Y)"
      qq( .(expr[[1]])(X, ...(as.list(expr[-1]))) )
    }
  } else {
    #`symbol` means "symbol(X)"
    qq( .(expr)(X) )
  }
}

When called, chain constructs a whole new function:

expand_macros_q( chain(data, foo, bar(5), baz(6, X), sum) )

Note that the definition of chain is wrapped in macro. macro does the work of substituting to get the expression form of its arguments, and evaluating the resulting expression in the caller; this work is common to many nonstandard evaluation functions. What's more, macro remembers (or memoizes) each expression it transforms, so even a complicated lexical analysis can be cached and evaluated quickly the second time around. This does mean that macro functions cannot evaluate their arguments or look in their dynamic environment to decide what to do (but this is not as big an obstacle as you may think; you can always return an immediately-invoked function expression to complete what needs to happen at runtime.)

qq is fast

Despite having much added functionality over bquote, qq is substantially faster over repeated calls. This is because qq itself operates as a macro. The first time you call qq, it performs the lexical analysis of finding all the .() calls, parsing them and determining how to splice them in, but instead of directly substituting, qq compiles a new function just for that particular substitution. When code comes to the same call to bquote again, that saved function is retrieved and executed quickly.

If you are curious to see what the compiled function looks like, vadr:::qq_internal is the hook into qq's compiler.

expr <- vadr:::qq_internal(
    quote( expression(over(.(num), .(denominator)) * .(symbol) ) ) )
print(expr)
num <- 5
print(eval(expr))

(Note that the compiled code uses literal values rather than calls to "list" etc. whenever it can. So it may not produce code that looks like it works. Here what prints out as list(over) is actually a literal list containing the symbol over, rather than a call to list with the argument over; the former is not evaluated while the latter is. Unfortunately R's pretty-printer does not distinguish these cases.)

Because it can remember and skip over the lexical analysis it did the first time, qq is often faster than bquote.

library(microbenchmark)
x <- 3;
y <- quote({x;y;z})
microbenchmark(
    times=250,
    bquote = bquote(function(a, b=.(x)) {foo; bar; .(y)}),
    qq = qq(function(a, b=.(x)) {foo; bar; .(y)}))

The compilation of the unquoter does take a bit of time on the first run-through (much of which is taken up by regexp matching looking for ".()" names), but the compiled form is faster than bquote. (Further optimizations of the compiled form are possible, too.)



crowding/vadr documentation built on May 14, 2019, 11:33 a.m.