Common Subexpression Elimination

library("rco")
library("microbenchmark")
library("ggplot2")
autoplot.microbenchmark <- function(obj) {
  levels(obj$expr) <- paste0("Expr_", seq_along(levels(obj$expr)))
  microbenchmark:::autoplot.microbenchmark(obj)
}
speed_up <- function(obj) {
  levels(obj$expr) <- paste0("Expr_", seq_along(levels(obj$expr)))
  obj <- as.data.frame(obj)
  summaries <- do.call(rbind, by(obj$time, obj$expr, summary))
  res <- c()
  for (i in seq_len(nrow(summaries) - 1) + 1) {
    res <- rbind(res, summaries[1, ] / summaries[i, ])
  }
  rownames(res) <- levels(obj$expr)[-1]
  return(res)
}

Common Subexpression Elimination

Background

Common Subexpression Elimination is an optimization that searches for instances of identical expressions, and replaces them with a single variable holding the computed value.

For instance, consider the following code:

a <- 1 / (8 + 8 + 1 + 9 * 1 ^ 8)
b <- (8 + 8 + 1 + 9 * 1 ^ 8) * 2

This code computes twice 8 + 8 + 1 + 9 * 1 ^ 8, this could be evaluated once, assigned to a new variable, and used twice. Like, for example:

cs_1 <- (8 + 8 + 1 + 9 * 1 ^ 8)
a <-  1 / cs_1
b <- cs_1 * 2

Example

Consider the following example:

code <- paste(
  "a <- b <- c <- 1",
  "for (i in 1:1000) {",
  "  a <- a ^ i ^ c",
  "  b <- b * i ^ c",
  "  c <- c + i ^ c",
  "}",
  sep = "\n"
)
cat(code)

Then, the automatically optimized code would be:

opt_code <- opt_common_subexpr(list(code))
cat(opt_code$codes[[1]])

And if we measure the execution time of each one, and the speed-up:

bmark_res <- microbenchmark({
  eval(parse(text = code))
}, {
  eval(parse(text = opt_code))
})
autoplot(bmark_res)
speed_up(bmark_res)

Implementation

The opt_common_subexpr will first detect different "environments", i.e., separate between function definitions and parent environment. Then, for each environment it will detect all those subexpressions in common between at least two expressions. If between two occurrences of the same subexpression, a variable involved in the subexpression is reassigned or a function is called (it can change the environment), then for these two occurrences the optimization is not performed. For all those remaining common subexpressions, the first common parent expression will be detected, a new variable called cs_# will be created in the parent expression, and replaced in each call to the subexpression.

To-Do

This can have an issue if the common function call returns random values.

For example:

r a <-rnorm(1) * 8 b <-rnorm(1) * 18

Will be wrongly optimized to:

r cs_1 <- rnorm(1) a <-cs_1 * 8 b <-cs_1 * 18

If the optimizer knows which functions modify their parent env, then function calls won't stop optimization



Try the rco package in your browser

Any scripts or data that you put into this service are public.

rco documentation built on April 17, 2021, 5:06 p.m.