The cache collection of functions streamlines use of save() and load() to manage objects during scripting.

Unlike other vignettes in the package that demonstrate functions in a category in order, this vignette demonstrates some use cases.

library(shrt)

Setting up a disk cache

A disk cache is a location on disk that holds computed objects so that they can be retrieved without requiring recomputing from scratch. The default location of the cache is the working directory and can be retrieved busing the functioncachedir() without arguments,

cachedir()

Adding an argument sets up a new directory

cachedir("cachedata")
cachedir()

It is also possible to include a prefix to each file of the cache directory.

cacheprefix("testing")
cacheprefix()

To stop using the prefix, set it to the empty string.

cacheprefix("")
cacheprefix()

Writing to and loading from a cache

Consider a simple test object abc,

abc = letters[1:3]
abc

Saving and loading this object from disk can be performed using save, load, and old files can be removed by file.remove.

abc.file = file.path("cachedata", "abc.Rda")
save(abc, file=abc.file)

At this point, the object exists on disk. It is possible to remove it from the working environment, and reload it from disk,

rm(abc)
load(abc.file)
abc

With the shrt cache-management, saving and re-loading can be simpliefied via functions savec and loadc,

savec(abc)
rm(abc)
abc = loadc("abc")
abc

The file path need not be constructed or specified, but it can be retrieved if necessary using cachefile,

cachefile("abc")

In the above example, using loadc requires assigning the value of the object into a variable. This may be useful to explicitly use a different object name. The package also provides an alternative function assignc that loads from cache and performs the assignment in one go.

rm(abc)
assignc("abc")
abc

When the object is no longer needed, it can be removed from the environment using rm or from botth the environment and cache using rmc.

before = file.exists(cachefile("abc"))
rmc(abc)
after = file.exists(cachefile("abc"))
c(before, after)

Again, the interface to rmc requires specifying the object, but not the file path in cache.

Pipelines with cache

Cached objects can be useful to speed-up a long-running script or pipeline. Functions assignc and makec provide tools for this purpose. For this use case, consider generating an object using a function.

myfun = function(x, rev=TRUE) {
  result = 1:x
  if (rev) {
    result = rev(result)
  }
  result
}
fwd5 = myfun(5)
fwd5

Within a pipeline, there might be a module to compute fwd5 that loads it from disk when available, or compute it and save it otherwise.

## base R
fwd5.file = file.path("cachedata", "fwd5.Rda")
if (!file.exists(fwd5.file)) {
  fwd5 = myfun(5)
  save(fwd5, file=fwd5.file)
} else {
  fwd5 = load1(fwd5.file)
}
fwd5

The snippet always ends with a representation of fwd5. This is explicit in the two statements begining with fwd5 = along the two branches of the if-else block.

An alternative implementation exploits the capability of assignc to report whether or not a cache representation exists in a manner that is consistent with an if statement.

## shrt v.1
if (!assignc("fwd5")) {
  fwd5 = myfun(5)
  savec(fwd5)
}
fwd5

Here, when the object exists in cache, fwd obtains a value through a side-effect of assignc. When the cached file does not exist, the object is computed and saved within the if block.

Although this is simpler than the original version, it still requires a manual call to savec to record the object in cache. It is possible to avoid that using makec. This function requires the name of the target object and a generator function; it performs all the cache maintenance automatically.

# shrt v.2
makec("fwd5", myfun, 5)
fwd5

The generator can take more than one argument. For example, to generate the reverse sequence taking advantage of myfun's optional second argument,

makec("rev5", myfun, 5, rev=TRUE)
rev5

Logging

By default, function assignc does not produce any log messages. Thus, the following snippet is silent.

abc = 5
    assignc("abc")

It is possible to activate messages by setting a verbosity level greater than 1.

verbose = 2
assignc("abc")

Turning off logging is achieved by resetting verbosity.

verbose = FALSE
assignc("abc")

Notes

It is worth stressing that assignc and makec perform different actions depending on the state of the current environment. The default order of preference is as follows:

suppressWarnings(rmc(fwd5))
suppressWarnings(rmc(rev5))
suppressWarnings(rmc(abc))
qq = file.remove("cachedata")


tkonopka/shrt documentation built on March 5, 2020, 2:51 p.m.