bake: Tools for reproducible computations

reproducibility_toolsR Documentation

Tools for reproducible computations

Description

Archiving of computations and control of the random-number generator.

Usage

bake(
  file,
  expr,
  seed = NULL,
  kind = NULL,
  normal.kind = NULL,
  dependson = NULL,
  info = FALSE,
  timing = TRUE,
  dir = getOption("pomp_archive_dir", getwd())
)

stew(
  file,
  expr,
  seed = NULL,
  kind = NULL,
  normal.kind = NULL,
  dependson = NULL,
  info = FALSE,
  timing = TRUE,
  dir = getOption("pomp_archive_dir", getwd())
)

freeze(
  expr,
  seed = NULL,
  kind = NULL,
  normal.kind = NULL,
  envir = parent.frame(),
  enclos = if (is.list(envir) || is.pairlist(envir)) parent.frame() else baseenv()
)

Arguments

file

Name of the archive file in which the result will be stored or retrieved, as appropriate. For bake, this will contain a single object and hence be an RDS file (extension ‘rds’); for stew, this will contain one or more named objects and hence be an RDA file (extension ‘rda’).

expr

Expression to be evaluated.

seed, kind, normal.kind

optional. To set the state and of the RNG. The default, seed = NULL, will not change the RNG state. seed should be a single integer. See set.seed for more information.

dependson

arbitrary R object (optional). Variables on which the computation in expr depends. A hash of these objects will be archived in file, along with the results of evaluation expr. When bake or stew are called and file exists, the hash of these objects will be compared against the archived hash; recomputation is forced when these do not match. The dependencies should be specified as unquoted symbols: use a list if there are multiple dependencies. See the note below about avoiding using ‘pomp’ objects as dependencies.

info

logical. If TRUE, the “ingredients” of the calculation are returned as a list. In the case of bake, this list is the “ingredients” attribute of the returned object. In the case of stew, this list is a hidden object named “.ingredients”, located in the environment within which stew was called.

timing

logical. If TRUE, the time required for the computation is returned. This is returned as the “system.time” attribute of the returned object.

dir

Directory holding archive files; by default, this is the current working directory. This can also be set using the global option pomp_archive_dir. If it does not exist, this directory will be created (with a message).

envir

the environment in which expr is to be evaluated. May also be NULL, a list, a data frame, a pairlist or an integer as specified to sys.call.

enclos

relevant when envir is a (pair)list or a data frame. Specifies the enclosure, i.e., where R looks for objects not found in envir. This can be NULL (interpreted as the base package environment, baseenv()) or an environment.

Details

On cooking shows, recipes requiring lengthy baking or stewing are prepared beforehand. The bake and stew functions perform analogously: an computation is performed and archived in a named file. If the function is called again and the file is present, the computation is not executed. Instead, the results are loaded from the archive. Moreover, via their optional seed argument, bake and stew can control the pseudorandom-number generator (RNG) for greater reproducibility. After the computation is finished, these functions restore the pre-existing RNG state to avoid side effects.

The freeze function doesn't save results, but does set the RNG state to the specified value and restore it after the computation is complete.

Both bake and stew first test to see whether file exists. If it does, bake reads it using readRDS and returns the resulting object. By contrast, stew loads the file using load and copies the objects it contains into the user's workspace (or the environment of the call to stew).

If file does not exist, then both bake and stew evaluate the expression expr; they differ in the results that they save. bake saves the value of the evaluated expression to file as a single object. The name of that object is not saved. By contrast, stew creates a local environment within which expr is evaluated; all objects in that environment are saved (by name) in file. bake and stew also store information about the code executed, the dependencies, and the state of the random-number generator (if the latter is controlled) in the archive file. Re-computation is triggered if any of these things change.

Value

bake returns the value of the evaluated expression expr. Other objects created in the evaluation of expr are discarded along with the temporary, local environment created for the evaluation.

The latter behavior differs from that of stew, which returns the names of the objects created during the evaluation of expr. After stew completes, these objects are copied into the environment in which stew was called.

freeze returns the value of evaluated expression expr. However, freeze evaluates expr within the parent environment, so other objects created in the evaluation of expr will therefore exist after freeze completes.

bake and stew store information about the code executed, the dependencies, and the state of the random-number generator in the archive file. In the case of bake, this is recorded in the “ingredients” attribute (attr(.,"ingredients")); in the stew case, this is recorded in an object, “.ingredients”, in the archive. This information is returned only if info=TRUE.

The time required for execution is also recorded. bake stores this in the “system.time” attribute of the archived R object; stew does so in a hidden variable named .system.time. The timing is obtained using system.time.

Avoid using ‘pomp’ objects as dependencies

Note that when a ‘pomp’ object is built with one or more C snippets, the resulting code is “salted” with a random element to prevent collisions in parallel computations. As a result, two such ‘pomp’ objects will never match perfectly, even if the codes and data used to construct them are identical. Therefore, avoid using ‘pomp’ objects as dependencies in bake and stew.

Compatibility with older versions

With pomp version 3.4.4.2, the behavior of bake and stew changed. In particular, older versions did no dependency checking, and did not check to see whether expr had changed. Accordingly, the archive files written by older versions have a format that is not compatible with the newer ones. When an archive file in the old format is encountered, it will be updated to the new format, with a warning message. Note that this will overwrite existing archive files! However, there will be no loss of information.

Author(s)

Aaron A. King

Examples

## Not run: 
  bake(file="example1.rds",{
    x <- runif(1000)
    mean(x)
  })

  bake(file="example1.rds",{
    x <- runif(1000)
    mean(x)
  })

  bake(file="example1.rds",{
    a <- 3
    x <- runif(1000)
    mean(x)
  })

  a <- 5
  b <- 2

  stew(file="example2.rda",
    dependson=list(a,b),{
      x <- runif(10)
      y <- rnorm(n=10,mean=a*x+b,sd=2)
    })

  plot(x,y)

  set.seed(11)
  runif(2)
  freeze(runif(3),seed=5886730)
  runif(2)
  freeze(runif(3),seed=5886730)
  runif(2)

  set.seed(11)
  runif(2)
  runif(2)
  runif(2)


## End(Not run)

pomp documentation built on Sept. 13, 2024, 1:08 a.m.