knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
While renv
can help capture the state of your R library at some point in time,
there are still other aspects of the system that can influence the runtime
behavior of your R application. In particular, the same R code can produce
different results depending on:
And so on. Docker is a tool that helps solve this problem through the use of containers. Very roughly speaking, one can think of a container as a small, self-contained system within which different applications can be run. Using Docker, one can declaratively state how a container should be built (what operating system it should use, and what system software should be installed within), and use that system to run applications. (For more details, please see https://environments.rstudio.com/docker.)
Using Docker and renv
together, one can then ensure that both the underlying
system, alongside the required R packages, are fixed and constant for a
particular application.
The main challenges in using Docker with renv
are:
renv
cache is visible to Docker containers, andrenv
restores the R packages as required when the
container is run.This vignette will assume you are already familiar with Docker; if you are not
yet familiar with Docker, the Docker Documentation
provides a thorough introduction. We'll discuss two strategies for using renv
with Docker:
renv
to install packages when the Docker image is generated;renv
to install packages when Docker containers are run.We'll explore the pros and cons of each strategy.
With Docker, Dockerfiles are used to define new images. Dockerfiles can be used to declaratively specify how a Docker image should be created. A Docker image captures the state of a machine at some point in time -- e.g., an Ubuntu operating system after downloading and installing R 3.5. Docker containers can be created using that image as a base, allowing isolated applications to run using the same pre-defined machine state.
First, you'll need to get renv
installed on your Docker image. The easiest
way to accomplish this is with the remotes
package. For example:
ENV RENV_VERSION 0.5.0-25 RUN R -e 'install.packages("remotes", repos = c(CRAN = "https://cran.rstudio.com")) RUN R -e 'remotes::install_github("rstudio/renv@${RENV_VERSION}")'
Now, renv
can be used to install packages on the image. If you'd like the
renv.lock
lockfile to be used to install R packages when the Docker image is
built, you can include something of the form:
WORKDIR /project COPY renv.lock ./ RUN R -e 'renv::restore()'
With this, renv
will download and install packages from CRAN and other
external sources as appropriate when the image is created.
There are two main downsides to this approach:
The set of R packages used is pre-baked into the image, so different applications or containers built from this image will either have to re-use the aforementioned set of packages, or reinstall the packages they need to update as required.
With this approach, the renv
package cache will not be used. This
implies that package installation through renv::restore()
may be
very slow, as all packages will have to be installed.
Both of these issues can be solved if package installation can be deferred to container runtime.
If you'd like to leverage the renv
package cache alongside Docker, then
you'll need to alter how your containers are created so that renv
can
ensure the project library is initialized before your application is run.
One can control the renv
cache directory with the environment variable
RENV_PATHS_CACHE
. For example:
Sys.setenv(RENV_PATHS_CACHE = "~/path/to/cache") renv:::renv_paths_cache()
Note that the platform and R version in use are appended to the requested cache directory. This ensures that a single directory can act a base of cached packages for multiple different platforms and R versions.
Next, we need to figure out how to tell the Docker containers we create
to use this cache. The most common option here is to mount a directory
in the container that maps to persistent storage on the host system, and
then set the aforementioned RENV_PATHS_CACHE
environment variable to
point to this mount. You can specify this when the container is launched.
For example, if you had a container running a Shiny application:
RENV_PATHS_CACHE_HOST=/opt/local/renv/cache RENV_PATHS_CACHE_CONTAINER=/renv/cache docker run --rm \ -e "RENV_PATHS_CACHE=${RENV_PATHS_CACHE_CONTAINER}" \ -v "${RENV_PATHS_CACHE_HOST}:${RENV_PATHS_CACHE_CONTAINER}" \ -p 14619:14619 \ R --vanilla --slave -e 'renv::activate(); renv::restore(); shiny::runApp(host = "0.0.0.0", port = 14619)'
With this, any calls to renv
APIs within the created docker container will
have access to the mounted cache. The first time you run a container, renv
will likely need to populate the cache, and so some time will be spent
downloading and installing the required packages. Subsequent runs should be much
faster, as renv
will be able to reuse the global package cache.
The primary downside with this approach compared to the image-based approach
is that it requires you to modify how containers are created, and requires
a bit of extra orchestration in how containers are launched. However, once
the renv
cache is active, newly-created containers will launch very quickly,
and a single image can then be used as a base for a myriad of different
containers and applications, each with their own private R library.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.