cache_exec: Cache the execution of an expression in memory or on disk

View source: R/cache.R

cache_execR Documentation

Cache the execution of an expression in memory or on disk

Description

Caching is based on the assumption that if the input does not change, the output will not change. After an expression is executed for the first time, its result will be saved (either in memory or on disk). The next run will be skipped and the previously saved result will be loaded directly if all external inputs of the expression remain the same, otherwise the cache will be invalidated and the expression will be re-executed.

Usage

cache_exec(expr, path = "cache/", id = NULL, ...)

Arguments

expr

An R expression to be cached.

path

The path to save the cache. The special value ":memory:" means in-memory caching. If it is intended to be a directory path, please make sure to add a trailing slash.

id

A stable and unique string identifier for the expression to be used to identify a unique copy of cache for the current expression from all cache files (or in-memory elements). If not provided, an MD5 digest of the deparsed expression will be used, which means if the expression does not change (changes in comments or white spaces do not matter), the id will remain the same. This may not be a good default is two identical expressions are cached under the same path, because they could overwrite each other's cache when one expression's cache is invalidated, which may or may not be what you want. If you do not want that to happen, you need to manually provide an id.

...

More arguments to control the behavior of caching (see ‘Details’).

Details

Arguments supported in ... include:

  • vars: Names of local variables (which are created inside the expression). By default, local variables are automatically detected from the expression via codetools::findLocalsList(). Locally created variables are cached along with the value of the expression.

  • hash and extra: R objects to be used to determine if cache should be loaded or invalidated. If (the MD5 hash of) the objects is not changed, the cache is loaded, otherwise the cache is invalidated and rebuilt. By default, hash is a list of values of global variables in the expression (i.e., variables created outside the expression). Global variables are automatically detected by codetools::findGlobals(). You can provide a vector of names to override the automatic detection if you want some specific global variables to affect caching, or the automatic detection is not reliable. You can also provide additional information via the extra argument. For example, if the expression reads an external file foo.csv, and you want the cache to be invalidated after the file is modified, you may use extra = file.mtime("foo.csv").

  • keep: By default, only one copy of the cache corresponding to an id under path is kept, and all other copies for this id is automaitcally purged. If TRUE, all copies of the cache are kept. If FALSE, all copies are removed, which means the cache is always invalidated, and can be useful to force re-executing the expression.

  • rw: A list of functions to read/write the cache files. The list is of the form list(load = function(file) {}, save = function(x, file) {}). By default, readRDS() and saveRDS() are used. This argument can also take a character string to use some built-in read/write methods. Currently available methods include rds (the default), raw (using serialize() and unserialize()), and qs (using qs::qread() and qs::qsave()). The rds and raw methods only use base R functions (the rds method generates smaller files because it uses compression, but is often slower than the raw method, which does not use compression). The qs method requires the qs package, which can be much faster than base R methods and also supports compression.

Value

If the cache is found, the cached value of the expression will be loaded and returned (other local variables will also be lazy-loaded into the current environment as a side-effect). If cache does not exist, the expression is executed and its value is returned.

Examples

# the first run takes about 1 second
y1 = xfun::cache_exec({
    x = rnorm(1e+05)
    Sys.sleep(1)
    x
}, path = ":memory:", id = "sim-norm")

# the second run takes almost no time
y2 = xfun::cache_exec({
    # comments won't affect caching
    x = rnorm(1e+05)
    Sys.sleep(1)
    x
}, path = ":memory:", id = "sim-norm")

# y1, y2, and x should be identical
stopifnot(identical(y1, y2), identical(y1, x))

yihui/xfun documentation built on May 19, 2024, 2:23 p.m.