fun.club | R Documentation |
This is a workflow manager which controls the generation of R objects, their caching in memory and storing on disk. It automatically tracks the object dependencies, so that if one object is invalidated eg. by modifying its generating function, it is deleted together with all dependencies. Later, when referenced, it is automatically regenerated always with the most recent generating functions. This is done behind the scenes, but the interface is transparent for the user, see examples.
One can have many fun.club
s open at the same time if they all
point to different physical directories.
The functions are considered equivalent if they are deparse()
d into the
same character string. This means, in particular, that the code outside the
functions is not checked, eg. if the function object calls another function
not in fun.club
, and this function changes, the objects will not be
deleted.
The package does not impose any limitation on the function object names,
any R names can be used (note that all variable names are limited in R to
10000 bytes, however, and were to 256 bytes in versions of R before 2.13.0,
see ?name). Any arguments can be used: named, positional and ...
. The
equivalent argument combinations like a=1, 2, c=3
and c=3, 1, 2
for a
function function(a=1, b=2, ..., c=3)
are recognized and a new object is
generated only for new arguments.
Advanced: There are two special function arguments: output.env
and
'file.ext' described below. They are related to the ways how the objects
are stored in memory and on disk, respectively. By default, the storage is
done fully automatically and is hidden from the user. These two arguments,
however, alter the default algorithms.
Advanced: 'output.env' argument can be useful for storing big objects. For example, let's consider
fun.club[typical.use] <- function(n) 1:n
Then, eg. the call typical.use[100000]
generates a "big object" which is
returned by the function and then copied to its final destination by the
library. To avoid copying, the object can be placed directly into the final
place, or, more precisely, to the final environment
. The latter acts as a
directory
holding R objects and is referred to by output.env
. For
example,
fun.club[advanced.use] <- function(n, output.env) { output.env[[ 'advanced.use' ]] <- 1:n }
Using output.env[[ 'advanced.use' ]] <- 1:n
the user stores directly
his/her "big object", so no extra copying is needed. Initially,
output.env
should appear as the argument of the function in function(n, output.env)
, but it should not
have a default value nor
be modified by the caller eg. like in advanced.use[100000, output.env = new.env()]
.
Then, behind the scenes, the library assigns to output.env
its correct
newly created environment value, so that in the function body the
expression output.env[[ 'advanced.use' ]] <- 1:n
becomes valid.
In output.env
environment the object is always stored under the name of
the function object, ie. advance.use
in our case.
If output.env
appears as a function argument, the library assumes that it
is the responsibility of the user to store the object and does not try to
do that itself.
Advanced: The way the files are stored on disk is determined by the
extension.selector
and savers
arguments in make.fun.club
function. Depending on the R object to be saved, the former decides which
file name extension should be chosen while the latter keeps the storage
function for a given extension. This works fine for saving any R
objects. Sometimes, however, one might need to store files external to
R. Eg. one may want to download remote files to local disk and then process
them in R. This step may be performed in R, but the files themselves with
the "raw" data may not correspond to any R object. Such external data
can not be saved by the default method. It is still advantageous, however,
to keep the download algorithms and downloaded files under control of
fun.club
library. In this case, the files are automatically deleted if
the algorithms change and, on the other hand, only the necessary files are
stored and without duplication.
Since the fun.club
automatic algorithms do not know how to save such
"raw" data, this task is transferred to the user who can do that using the
file.ext
argument. When calling, file.ext
should be set to the desired
file name extensions. Then, internally, before the function execution, this
argument is expanded to the full absolute file names with the corresponding
extensions. file.ext
keeping the file names can be used in the function
body (but the user should not modify them). The files will be saved in the
same internal directories where fun.club
stores other objects.
The syntax is explained in the following example
fun.club[ write.external.files ] <- function(x, file.ext = c("txt", "txt.gz")) { writeLines(as.character(x), con = file.ext[1]) system(paste("gzip -c", file.ext[1], ">", file.ext[2])) file.ext }
Then, write.external.files[1:10]
stores the numbers 1:10 to the files
fun.club
. The
exact fun.club
internal algorithms.
Since the function above returns file.ext
, the return value is a vector
(write.external.files[1:10]
with the same
arguments always returns these file names without regenerating the files.
Using file.ext
argument, the user informs the library that
In the example above file.ext
was given as a default argument, but it can
also be redefined by the caller, eg.
write.external.files[1:10, file.ext = c('raw', 'raw.gz')]
Since the argument combination is different here, this will generate a new object.
If several function objects are defined using one function at once,
file.ext
should be given as a list of character vectors, one per function
object:
fun.club[ writer.1, writer.2 ] <- function(x, file.ext = list(c("txt", "txt.gz"), "gz")) { writeLines(as.character(x), con = file.ext[[ 1 ]][ 1 ]) system(paste("gzip -c ", file.ext[[ 1 ]][ 1 ], ">", file.ext[[ 1 ]][ 2 ])) writeLines(as.character(2*x), con = file.ext[[ 2 ]][ 1 ]) file.ext }
In this case file.ext
is expanded to the corresponding list of file
name(s) with one element per function object. If there is only one
function object, as in the first example, file.ext
might be alternatively
given as a list with a single element eg. as list(c("txt", "txt.gz"))
. Then it would be expanded to the list(c("
Vladislav BALAGURA balagura@cern.ch
## create `fun.club`: a factory to generate `fun.objects`, ie. special
## functions equipped with the capabilities to track and to cache all
## generated objects.
##
fc <- make.fun.club(dir = 'my_fun_club_directory')
##
## create the first "function object" `f1`
##
fc[f1] = function(x) x
##
## which can generate other objects as
##
f1[100]
##
## all such generated objects are cached and their dependencies are
## automatically tracked:
##
fc[f1] = function(x) 2*x
##
## f1[100] is automatically deleted and can be regenerated on demand:
##
f1[100]
##
## More complicated function with variable number of arguments in `...`
##
fc[f2] = function(y=1, ...) f1[y] * sum(unlist(list(...)))
f2[10, 1, 2, 3]
##
## The functions without arguments are also allowed. The functions can
## return arbitrary R objects (eg. other functions):
##
fc[f3] = function() { function(n) { rnorm(n) } }
##
## The function can return saveral objects placed in a `list`: `f4` below
## will return `f1[a,b]`, `f5` - `f2[a,b]` and `f6` - `f3[]`. This is
## useful if eg. the calculation gives two `data.frames` as a result, but
## they should be stored separately. This can be desirable eg. if the
## sizes of two objects are significantly different: there will be no need
## to keep in memory or reread from a file the big object to access the
## small one.
##
fc[f4, f5, f6] = function(a, b) list(f1[a+b], f2[a,b], f3[])
f4[1,2]
##
## Calling `f4` automatically generates `f5` and `f6'.
## `f4` and `f5` can be used as separate functions:
##
fc[f7] = function(a, b) f4[a,b] + f5[a,b]
##
## The request to generate `f7` object triggers the generation of all other
## objects it depends on
##
f7[1,2]
##
## since this `f7[1,2]` depends on `f1` (through `f5-f2`), changing `f1`
## deletes it together with all other dependencies:
##
fc['f1'] = function(x) x^2
##
## regardless of whether the objects were generated or not, syntactically
## they are always referred to in the same way, so the user might operate
## with them as if they were always available:
##
f7[1,2] + f6[3,4]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.