knitr::opts_chunk$set( collapse = TRUE, error = FALSE, comment = "#>" ) r_output <- function(x) { cat(c("```r", x, "```"), sep = "\n") }
Often the most difficult part of configuring your cluster jobs is sorting out all the packages that you need and making sure that they are present on the cluster. There are several levels of difficulty here and this document will walk through them in turn.
This is the most straightforward situation - all your packages are on CRAN. You don't need to do anything special typically, just create your context with a list of packages and create the queue:
root <- "pkgs" ctx <- context::context_save(root, packages = c("dplyr", "ggplot2")) obj <- didehpc::queue_didehpc(ctx)
What happened above was when the queue started up it looked to see what packages were available (none were) and then installed everything needed to run your jobs. That includes the two packages listed above but also all their dependencies and context
which didehpc
uses to send the jobs back and forth.
All these packages are installed into a special directory within the context root:
dir(file.path(root, "lib/windows", as.character(getRversion()[1, 1:2])))
Everything in this library will be available to your R jobs when they run.
We keep many often-used packages in a semi-stable repository (see the mrc-ide drat, the ncov drat and the more experimental R-universe system that is being developed to support this sort of workflow in future).
To tell didehpc
to look in one of these repositories when installing, create a conan::conan_sourcs
object and list additional repositories as the repos
argument, and pass this object in as the package_sources
argument to context_save
. Here, we add the mrc-ide drat repository and install the dde
package; this will use the development version which is often ahead of the CRAN version.
src <- conan::conan_sources(NULL, repos = "https://mrc-ide.github.io/drat/") ctx <- context::context_save(root, packages = "dde", package_sources = src)
Create the library as before, and dde
will be installed
obj <- didehpc::queue_didehpc(ctx)
If you want to add your packages to one of these repositories, please talk to Rich. You will need to increase your version number at each change (typically each merge into main/master) for the installation to notice that you have made changes.
We use pkgdepends
as the engine for installing packages from exotic locations. This is a problem that is slightly more complicated than it seems because the resolution of the dependencies are not always unambiguous, particularly with networks of dependent packages.
The basic idea is this. Suppose we want to install the rfiglet
package, which is not on CRAN. We use the "Remotes"-style reference richfitz/rfiglet
as an entry to conan_sources
so that didehpc
knows where to install rfiglet
from:
src <- conan::conan_sources("richfitz/rfiglet") ctx <- context::context_save(root, packages = "rfiglet", package_sources = src)
Note that we still list rfiglet
within the packages
section of context::context_save
as that is what is used to load the package.
If you want to be even more explicit you can use github::richfitz/rfiglet
as the reference, and you can add references such as richfitz/rfiglet@d713c1b8
to point at a particular commit, branch or tag.
obj <- didehpc::queue_didehpc(ctx)
To install a private package, first make a local copy of the package somewhere on your system. Then you need to build a source copy of this package (this will have a file extension of tar.gz
).
For example, suppose that the path ~/Documents/src/defer
contains a copy of your sources that you want to install, you could write:
path <- pkgbuild::build("~/Documents/src/defer", ".")
The second argument (.
) is the directory that the built package will be created in. This must be in your working directory. You might find using something like pkgs
as a destination helps keeps things tidy. (You may want to use the vignettes = FALSE
argument to speed this process up if your package includes slow-to-run vignettes as they will be of no use on the cluster).
file.info(path)
Then construct your package sources passing in the relative path to your package. We can use the path
variable here, or you could write r path
directly, or something like r paste0("local::", basename(path))
. If you have multiple packages you can pass a vector in.
src <- conan::conan_sources(path) ctx <- context::context_save(root, packages = "defer", package_sources = src)
when you construct the context, this package will be installed for you
obj <- didehpc::queue_didehpc(ctx)
You must have local copies of all packages installed (i.e., on the machine that is submitting the jobs). This is because we use some information about the packages to work out what can be run on the cluster. If you see a message like this when creating the queue object:
Loading context d1b3973bef7762b8d4d4ff5cbe090b2c [ context ] d1b3973bef7762b8d4d4ff5cbe090b2c [ library ] rfiglet Error in library(p, character.only = TRUE) : there is no package called ‘rfiglet’
it means that you do not have the package installed locally and you should install it before continuing.
You cannot upgrade packages while you have cluster jobs running. The reason for this is file locking; any cluster job running has a copy of the package loaded and will prevent deletion. Unfortunately the installation will delete quite a lot of the package before it realises that it is locked, which causes all sorts of problems.
Typically if you hit this you will see a "permission denied" error concerning a dll. Once this has happened you should be prepared for any queued jobs to fail.
To avoid, if upgrading packages, use a new context root.
The package installation may seem a bit magic but you can tame it a little.
When constructing your queue object, you can control how provisioning will occur with the provision
argument. The default is to check to see if any packages listed in your context's packages
argument are missing and only then do installation.
If you pass provision = "fake"
it will leave your library alone no matter what. Alternatively pass provision = "upgrade"
to try and upgrade packages, or provision = "later"
to skip this step for now. You can't submit jobs while your package installation looks incomplete.
If you want to add additional things into the library without running the full provisioning (which might upgrade all sorts of things) you can use the install_packages()
method on the object. This ignores the contents of your conan_sources
and you pass directly in the pkgdepends
-style references; see the pkgdepends
documentation for the myriad options here. Examples of usage include:
Install the latest version of a CRAN package
obj$install_packages("data.table")
Install a GitHub package
obj$install_packages("richfitz/stegasaur")
Install some local package from a .tar.gz
file
obj$install_packages("local::mypkg_0.1.2.tar.gz")
You can possibly use this interface (along with provision = "fake"
) to manipulate your package installation fairly flexibly.
It is possible to end up in a situation where pkgdepends
can't resolve your dependencies, or where in resolving dependencies an unwanted version of a package was installed. Please let Rich know with enough detail for him to reproduce the example himself:
didehpc::queue_didehpc(...)
covering things like context::context_save()
and conan::conan_sources()
.tar.gz
files that you are usingAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.