knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

devtools::load_all()
mustashe::clear_stash()

set.seed(123)

mustashe

CRAN status CRAN downloads R-CMD-check Codecov test coverage License: GPL v3

The goal of 'mustashe' is to save time on long-running computations by storing and reloading the resulting object after the first run. The next time the computation is run, instead of evaluating the code, the stashed object is loaded. 'mustashe' is great for storing intermediate objects in an analysis.

Installation

You can install the released version of 'mustashe' from CRAN with:

install.packages("mustashe")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("jhrcook/mustashe")

Loading 'mustashe'

The 'mustashe' package is loaded like any other, using the library() function.

library(mustashe)

Basic example

Below is a simple example of how to use the stash() function from 'mustashe'.

Let's say, for part of an analysis, we are running a long simulation to generate random data rnd_vals. This is mocked below using the Sys.sleep() function. We can time this process using the 'tictoc' library.

tictoc::tic("random simulation")
stash("rnd_vals", {
  Sys.sleep(3)
  rnd_vals <- rnorm(1e5)
})
tictoc::toc()

Now, if we come back tomorrow and continue working on the same analysis, the second time this process is run the code is not evaluated because the code passed to stash() has not changed. Instead, the random values rnd_vals is loaded.

tictoc::tic("random simulation")
stash("rnd_vals", {
  Sys.sleep(3)
  rnd_vals <- rnorm(1e5)
})
tictoc::toc()

Dependencies

A common problem with storing intermediates is that they have dependencies that can change. If a dependency changes, then we want the stashed value to be updated. This is accomplished by passing the names of the dependencies to the depends_on argument.

For instance, let's say we are calculating some value foo using x. (For the following example, I will use a print statement to indicate when the code is evaluated.)

x <- 100

stash("foo", depends_on = "x", {
  print("Calculating `foo` using `x`.")
  foo <- x + 1
})

foo

Now if x is not changed, then the code for foo does not get re-evaluated.

x <- 100

stash("foo", depends_on = "x", {
  print("Calculating `foo` using `x`.")
  foo <- x + 1
})

foo

But if x does change, then foo gets re-evaluated.

x <- 200

stash("foo", depends_on = "x", {
  print("Calculating `foo` using `x`.")
  foo <- x + 1
})

foo

Other API features

Functional interface

In the examples above, stash() does not return a value (actually, it invisibly returns NULL), instead assigning the result of the computation to an object named using the var argument. Frequently, though, a return value is desired. This behavior can be induced by setting the argument functional = TRUE.

b <- stash("b", functional = FALSE, {
  rnorm(5, 0, 1)
})
b
b <- stash("b", functional = TRUE, {
  rnorm(5, 0, 1)
})
b

Functions as dependencies

The stash() function can take other functions as dependencies. The body and formals components of the function object are checked to see if they have changed. (More information on the structure of function objects in R can be found in Hadley Wickham's Advanced R - Functions: Function components.)

As an example, suppose you have a script with the following code. It is run, and the value of 5 is stashed for a and it is dependent on the function add_x().

add_x <- function(y, x = 2) {
  y + x
}

stash("a", depends_on = "add_x", {
  a <- add_x(3)
})
a

You continue working and change the function add_x() to use the default value of 5 instead of 2. This change will cause the code for a to be re-run and a will be assigned the value 8. Note that the code in the code argument for stash() did not change, the code was re-run because a dependency changed.

add_x <- function(y, x = 5) {
  y + x
}

stash("a", depends_on = "add_x", {
  a <- add_x(3)
})
a

Using stash() in functions

Because of the careful management of R environments, stash() can be used inside of functions. In the example below, note that the stashed object will depend on the value of the magic_number object in the function.

magic_number <- 10
do_data_science <- function() {
  magic_number <- 5
  stash("rand_num", depends_on = c("magic_number"), {
    runif(1, 0, 10)
  })
  return(rand_num)
}

do_data_science()

Changing the value of the magic_number object in the global environment will not invalidate the stash.

magic_number <- 11
do_data_science()

Stashing results of sourcing a R script

It is also possible to stash the results of sourcing and R script. The contents of the script are an implicit dependency for the stash, so if the script changes, it will be re-sourced the next time around. It is also possible to include additional dependencies using the depends_on parameter in the same way as with a regular stash.

The natural behavior of the source() function is maintained by returning the last evaluated value in the script.

# Write a temporary R script.
temp_script <- tempfile()
write("print('Script to get 5 letters'); sample(letters, 5)", temp_script)

x <- stash_script(temp_script)
x
x2 <- stash_script(temp_script)
x2

Configuration

Using 'here' to create file paths

The 'here' package is useful for handling file paths in R projects, particularly when using an RStudio project. The main function, here::here(), can be used to create the file path for stashing an object by setting the 'mustashe' configuration option with the config_mustashe() function.

config_mustashe(use_here = TRUE)

This behavior can be turned off, too.

config_mustashe(use_here = FALSE)

Other options

Defaults for the verbose and functional (see above) arguments of stashing functions can also be configured. For example, you can have the functions run silently and return the result by default.

config_mustashe(verbose = FALSE, functional = TRUE)

Acknowledgements

Contributors

I would like to thank the contributors to this package for their additions of key features and bug squashing:

Attribution

The inspiration for this package came from the cache() feature in the 'ProjectTemplate' package. While the functionality and implementation are a bit different, this would have been far more difficult to do without referencing the source code from 'ProjectTemplate'.


Contact

Any issues and feedback on 'mustashe' can be submitted here. Alternatively, I can be reached through the contact form on my website or on Twitter \@JoshDoesa

unlink(".mustashe", recursive = TRUE)


jhrcook/mustasher documentation built on Oct. 10, 2022, 5:37 a.m.