set.seed(1234L) 
modules_path <- file.path(
  if (grepl("/docs/", getwd(), fixed = TRUE)) file.path("..", "..") else "..", 
  "inst", "modules")
library(modulr) 
library(networkD3)
library(chorddiag)
library(RColorBrewer)
library(memoise)
library(devtools)
options(knitr.duplicate.label = 'allow')
`%<=%` <- modulr::`%<=%` 
Sys.setlocale("LC_TIME", "en_DK.UTF-8") 
Sys.setenv(TZ = 'UTC') 
knitr::opts_chunk$set(
  collapse = TRUE, 
  comment = "#>", 
  fig.path = "./figures/modulr-",
  fig.width = 6.0, 
  fig.height = 4.0, 
  out.width = "90%", 
  fig.align = "center"
) 
BUILD <- 
  identical(tolower(Sys.getenv("BUILD")), "true") &&
  !identical(tolower(Sys.getenv("TRAVIS")), "true") && 
  identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(purl = BUILD)
gears_path <- file.path(tempdir(), "gears")
unlink(gears_path, recursive = TRUE)
options(modulr.gears_path = gears_path)
reset <- function() {
  modulr::reset()
  root_config$set(modules_path)
}

Anatomy of a module

Modules are defined in a declarative way, using the keywords %requires% and %provides%, and have four main components:

A typical module looks like the following:

"name_of_module" %requires% list(

  # A list of dependencies.
  dependency_1 = "name_of_dependency_1", 
  dependency_2 = "name_of_dependency_1",
  ... = "..."

) %provides% {

  #' A recommended docstring intended to document the internals of the module.

  # A section where add-on packages are loaded and attached.
  library(package_1)
  library(package_2)
  library(...)

  # Some code that uses the objects `dependency_1` and `dependency_2" 
  # returned by the modules "name_of_dependency_1" and "name_of_dependency_2".
  object <- { ... }

  # A resulting object, which can be directly consumed or in turn injected as a 
  # dependency.
  return(object)

} 

When a module is defined, modulr has to make it in order to evaluate the code it provides:

result <- make("name_of_module") 

or with a handy syntactic sugar:

result %<=% "name_of_module" 

or interactively with the hit function:

hit(name_of_module) # or
hit(name_of) # which will prompt the user to choose among all possible match

The result contains the computed object exposed by the module. Under the hood, the dependencies have been sorted and appropriately made, and their resulting objects injected where required.

A first working example

reset() 

Let us start by defining some modules and dependencies:

"foo" %provides% "Hello"

"bar" %provides% "World"

"baz" %provides% "!"

"foobar" %requires% list(
  f = "foo", 
  b = "bar",
  z = "baz"
) %provides% { 
  #' Return a concatenated string. 
  paste0(f, ", ", tolower(b), z) 
} 

Use info to output the docstrings:

info("foobar") 

Use lsmod to list all defined modules and their properties in a data frame:

lsmod(cols = c("name", "type", "dependencies", "uses", "size", "modified")) 

In this example, foobar relies on three dependencies, and foo, bar and baz are both injected once. Use plot_dependencies to see these relations:

plot_dependencies() 
plot_dependencies(render_engine = chord_engine) 

Use make to get the resulting object provided by the module:

make("foobar")

Voilà! All the depencencies have been evaluated, injected and processed to return the expected "Hello, world!". As a matter of fact, the dependencies form a directed acyclic graph which is topologically sorted, just in time to determine a well ordering for their evaluation before injection.

lsmod(cols = c("name", "type", "dependencies", "uses", "size", "modified")) 

Types of modules

Singletons

reset()

All modules are singletons: once evaluated, they always return the same resulting object. This is one of the great advantages of modulr: module evaluation takes place parsimoniously, when changes are detected or explicitely required, à la GNU Make.

"timestamp" %provides% {
  #' Return a string containing a timestamp.
  format(Sys.time(), "%H:%M:%OS6")
}

Successive make calls on the module will not imply its re-evaluation:

make("timestamp")
with_verbosity(0, make("timestamp")) # temporarily change the verbosity of make

Notice the with_verbosity wrapper around the call. To force re-evaluation, just touch the module:

touch("timestamp")
make("timestamp")

Any change of the module's definition (even its docstrings) will be detected:

"timestamp" %provides% {
  #' Return a string containing a timestamp with more information.
  format(Sys.time(), "%Y-%m-%d %H:%M:%OS6")
}

make("timestamp")

Prototypes

reset()

It is granted that all modules are singletons. Nonetheless, a module is allowed to return any object, in particular it can return a function (closure) that itself returns a desired object or produces some side effect. In this case, such a module behaves like a so-called prototype.

"timestamp" %provides% {
  function() format(Sys.time(), "%H:%M:%OS6")
}
make("timestamp")()

with_verbosity(0L, make("timestamp")())

It is important to emphasize that the module is still a singleton: the second make call doesn't re-evaluate. But the function that is returned by the module is itself re-evaluated each time it is called.

Memoised prototypes

reset()

Singletons produce cached objects at make-time and prototypes produce computed objects at run-time. In a complementary manner, memoised modules produce cached objects at run-time. Memoisation and Hadley Wickam's memoise package give an elegant solution to this requirement.

To see the essence of what is happening, we decrease the verbosity of modulr and set up a simple starting scenario: foo requires the somewhat resource-consuming timestamp module, defined as a singleton:

set_verbosity(1L) # messages are shown only when changes occur

"timestamp" %provides% {
  # This is a singleton.
  message("'timestamp' is evaluated after a (short) pause...")
  Sys.sleep(1L)
  format(Sys.time(), "%H:%M:%OS6")
}

"foo" %requires% list(
  timestamp = "timestamp"
) %provides% {
  "foo"
}

system.time(make("foo"))

In this example, timestamp is evaluated even though it is not explicitely used by foo. It just computes a timestamp after a short pause, but it could be virtually very resource-consuming at make-time.

Let us re-define timestamp as a prototype:

"timestamp" %provides% {
  # This is a prototype.
  function() {
    message("'timestamp' is evaluated after a (short) pause...")
    Sys.sleep(1L)
    format(Sys.time(), "%H:%M:%OS6")
  }
}

system.time(make("foo"))

Here, the evaluation consists of defining a function that pauses for a while and returns a timestamp, only when the function is explicitely called. Even if the computation encapsulated by the function is very resource-consuming, no evaluation of the returned function takes place at make-time.

Finally, let us re-define timestamp as a memoised module:

library(memoise)
"timestamp" %provides% {
  # This is a memoised module.
  memoise::memoise(
    function() {
      message("'timestamp' is evaluated after a (short) pause...")
      Sys.sleep(1L)
      format(Sys.time(), "%H:%M:%OS6")
    }
  )
}

system.time(make("foo"))

The timestamp module returns a function which will be evaluated only when explicitely called at run-time. Let us re-define foo in order that it effectively uses timestamp.

"foo" %requires% list(
  timestamp = "timestamp"
) %provides% {
  message("It is ", timestamp())
  "foo"
}

system.time(make("foo"))

Here, a timestamped message is outputed. Let us force the re-evaluation of foo.

touch("foo")
system.time(make("foo"))

The memoised version of timestamp is evaluated only at run-time, not at make-time; moreover, the string containing the actual timestamp is computed only once and then cached for future calls, avoiding re-evaluation.

To force re-evaluation of the memoised function exposed by timestamp, use memoise::forget.

memoise::forget(make("timestamp"))
touch("foo")
system.time(make("foo"))

Lists

It is often useful for a module to expose several (immutable, cf. infra) objects at once by returning a list.

reset()
"timestamps" %provides% {
  now <- function() Sys.time()
  list(
    origin = structure(0L, class = "Date"),
    yesterday = function() now() - 86400L,
    now = now,
    tomorrow = function() now() + 86400L
  )
}

ts %<=% "timestamps"

ts$origin
ts$yesterday()
ts$now()
ts$tomorrow()

Environments

It is often useful for a module to expose several mutable objects at once by returning an environment.

reset()
"configuration" %provides% {
  env <- new.env(parent = emptyenv())
  env$shape <- "circle"
  env$color <- "blue"
  env$size <- 13L
  env
}

config %<=% "configuration"

config$color
config$color <- "red"
config$color

This kind of module can be used to share mutable data between modules, without polluting the Global Environment.

"widget_A" %requires% list(
  config = "configuration"
) %provides% {
  list(
    switch_color = function()
      config$color <- if (config$color == "blue") "red" else "blue"
  )
}

"widget_B" %requires% list(
  config = "configuration"
) %provides% {
  list(
    switch_shape = function()
      config$shape <- if (config$shape == "circle") "square" else "circle"
  )
}

widget_A %<=% "widget_A"
widget_B %<=% "widget_B"

widget_A$switch_color()
config$color

widget_B$switch_shape()
config$shape

The modulr package implements the dedicated syntactic sugar %provides_options% for this frequent purpose.

undefine("configuration")

"configuration" %provides_options% list(
  shape = "circle",
  color = "blue",
  size = 13L
)

config %<=% "configuration"
widget_A %<=% "widget_A"

config$color
widget_A$switch_color()
config$color

It is also possible to use the shared environment associated to every injector:

"widget_B_prime" %provides% {
  list(
    switch_shape = function()
      .SharedEnv$shape <- 
        if (.SharedEnv$shape == "circle") "square" else "circle"
  )
}

widget_B_prime <- make()

.SharedEnv$shape <- "circle"
widget_B_prime$switch_shape()
.SharedEnv$shape

Semantic Versioning (SemVer)

The modulr package offers Semantic Versioning capabilities: every module can live in several versions numbers of the form x.y.z, where x, y, and z are the major, minor, and patch versions, respectively. For instance, foo#1.2.3 designates module foo in version 1.2.3.

Given a version number, increment the:

For instance, foo#1.2.3 becomes foo#1.2.4 after a bug fix and foo#1.3.0 after a functionality bump.

Use:

Here are some examples among foo#1.2.3, foo#1.2.4, foo#1.3.0:

Initial scenario: no versioning

There is a good chance that your initial scenario contains no versioned module.

"great_module" %provides% {
  function() {
    Sys.sleep(1L)
    "great features"
  }
}
"complex_module" %requires% list(
  great = "great_module"
) %provides% {
  function() cat(paste("complex module using", great()))
}
system.time(make("complex_module")())

In this scenario, great_module does what it is supposed to do, but clearly not very efficiently. You then decide to work on a new version that improves its performance.

Setting-up versioning

First, we clone great_module with an initial version number.

"great_module#0.1.0" %clones% "great_module"

We then adapt the requirements where great_module is injected as a dependency: for complex_module, we decide to accept bug fixes, refactorisations, and new functionalities, as long as the API does not change in an incompatible backward manner.

"complex_module" %requires% list(
  great = "great_module#^0.1.0"
) %provides% {
  function() cat(paste("complex module using", great()))
}
system.time(make("complex_module")())

Here is the minor bump of great_module, which is a little bit more efficient.

"great_module#0.2.0" %provides% {
  # Improved internals, same interface
  function() "great optimisd features"
}
system.time(make("complex_module")())

And here is the latest bug fix correcting the typo.

"great_module#0.2.1" %provides% {
  # Bug fix
  function() "great optimised features"
}
make("complex_module")()

Location of modules

Modules can be defined in several locations: in-memory, on-disk in its own file or along another module's file, and remotely on GitHub's Gist or via the HTTP(S) protocol.

In-memory

This is the most direct method to define a module. This is also the most volatile, since the lifespan of the module is limited to the R session.

reset()
"foo" %provides% "bar"
lsmod(cols = c("name", "storage", "along", "filepath", "url"))

On-disk

reset()

This is the way to go when a module is intended to be reused. In such a case, the definition takes place in a dedicated R, R Markdown, or R Sweave file, which path and name are closely related to the module's name.

For instance, the following module definition is stored in the R file swissknife.R, under the sub-directory vendor/tool of the ./modules directory.


When a module is invoked, modulr searches for it in-memory first, then on-disk if necessary. There are several default root places where modulr looks for the module's file: ./modules/, ./module/, ./libs/, ./lib/, and ./. This behaviour can be configured with the help of root_config.

root_config$get_all()

This explains why modulr finds the module vendor/tool/swissknife under the file ./modules/vendor/tool/swissknife.R.

my_swissknife %<=% "vendor/tool/swissknife"

This also works with R Markdown .Rmd and R Sweave .Rnw files.

cat(readLines(file.path(modules_path, "vendor", "tool", "multitool.Rmd")), sep = "\n")
load_module("vendor/tool/multitool") # load only, do not make
lsmod(cols = c("name", "storage", "along", "filepath", "url"))

Along a principal module, it is possible to define other related modules, for instance mock-ups and testing modules (cf. infra).

Remote (modulr gears)

reset()

Using GitHub's Gist is a simple way to share modules with others. To illustrate this, let us consider the following remote module, aka modulr gear: https://gist.github.com/aclemen1/3fcc508cb40ddac6c1e3.

"modulr/vault" %imports% "https://gist.github.com/aclemen1/3fcc508cb40ddac6c1e3"

Notice that only specifiying the gist ID in "modulr/vault" %imports% "3fcc508cb40ddac6c1e3" has the same effect. It is possible to import modules from any URL using the HTTP(S) protocol.

Once imported, a remote module appears to be in-memory defined.

lsmod(cols = c("name", "storage", "along", "filepath", "url"))

To use a remote module as a dependency, just import it where needed (even in a remote module).

"modulr/vault" %imports% "3fcc508cb40ddac6c1e3"

"module/using/a/gear" %requires% list(
  vault = "modulr/vault"
) %provides% {
  vault$decrypt(
    secret = "TWUnCkRAlP70XvmRlnAFrw==",
    key = "EaJWzAZjjphu9CoA+MPUVCL8mmMAGp0j6Nbga29kV/A=")
}

make()

Notice that sharing a module is as easy as sending this one-liner code snippet:

library(modulr); "modulr/vault#^0.1.0" %imports% "3fcc508cb40ddac6c1e3"

Finally, private Gists and GitHub Enterprise users are also covered, thanks to GitHub's Personal Access Tokens (PAT). For instance, with the GitHub Enterprise instance of the University of Lausanne:

# Set 'GITHUB_PAT' in your '.Renviron' file or right here:
# Sys.setenv(GITHUB_PAT = "Your Personal Access Token here")

"modulr/private_GitHubEnterprise_module" %imports% 
  "https://github.unil.ch/api/v3/gists/1afa4770670975d70806c2153aac50a9"
(function() {
  GITHUB_PAT_bak <- Sys.getenv("GITHUB_PAT")
  Sys.setenv(GITHUB_PAT = Sys.getenv("GITHUB_UNIL_PAT"))
  on.exit(Sys.setenv(GITHUB_PAT = GITHUB_PAT_bak))
  "modulr/private_GitHubEnterprise_module" %imports% 
    "https://github.unil.ch/api/v3/gists/1afa4770670975d70806c2153aac50a9"
})()

Let us assume that the following module is worth publishing:

"modulr/release_gist_example" %provides% "Hello World!"

To release this module as a modulr gear on GitHub's Gist, simply use release_gear_as_gist:

release_gear_as_gist()
g <- release_gear_as_gist(browse = FALSE)
g <- list(html_url = "<URL>")

The module is then publicly available here: r g[["html_url"]].

Special variables

.Last.name

When a module is defined, touched, or made, its name is always assigned to .Last.name. The special variable .Last.name is also used as a default parameter for make, touch, and undefine.

reset()
"foo" %provides% "bar"

.Last.name

make()

touch()

undefine()

Module's metadata

reset()

Every module has access to some of its metadata: name, version, file path (when on-disk), etc. The following module illustrates this feature and is self-explanatory.


with_verbosity(0L, make("my/great/module/reflection"))

The special module modulr

The modulr package defines a special module named modulr that can be injected in any module. The purpose of this special module is to give access to useful helper functions related to the module into which it is injected.

info("modulr")

Messages

TODO

Post-evaluation hook

There are situations where a post-evaluation hook is needed. For instance, to define an ephemeral module that can be evaluated only once, or to define a so-called no-scoped module, which looks like a pure singleton, but behaves like a prototype.

"ephemeral" %requires% list(
  modulr = "modulr"
) %provides% {
  modulr$post_evaluation_hook(undefine("ephemeral"))
  "A butterfly"
}

make("ephemeral") # returns a butterfly
try(make("ephemeral"), silent = TRUE) # no more
cat(geterrmessage())
"no_scoped" %requires% list(
  modulr = "modulr"
) %provides% {
  modulr$post_evaluation_hook(touch("no_scoped"))
  Sys.time()
}

make("no_scoped")
Sys.sleep(1L)
make("no_scoped")

Notice that the expression passed to the hook is evaluated in the environment in which the module is used. Therefore, a direct call to .__name__ would not return the name of the intuitively expected module. The following example illustrates how to circumvent this kind of difficulty.

"no_scoped" %requires% list(
  modulr = "modulr"
) %provides% {
  eval(substitute(modulr$post_evaluation_hook(touch(me)), list(me = .__name__)))
  Sys.time()
}

make("no_scoped")
Sys.sleep(1L)
make("no_scoped")

Scripting

Turning a bunch of modules working perfectly well together into a script is a very common situation, that can be handled with the help of the following boilerplate code:

# filepath: ./script.R
"script" %requires% list(
  dep_1 = "dependency_1",
  ...
) %provides% {
  function() {
    # body of the script here
  }
}

if (.__name__ == "__main__") 
  # execute only if sourced/run as a script (à la Python)
  make()()

Coding style



openscienceunil/modulr documentation built on May 3, 2019, 5:49 p.m.