knitr::opts_chunk$set( collapse = TRUE, error = FALSE, comment = "#>" ) r_output <- function(x) { cat(c("```r", x, "```"), sep = "\n") }
Get yourself running R jobs on the cluster in 10 minutes or so.
Assumptions that we make here:
you are using R
your task can be represented as running a function on some inputs to create an output (a file based output is OK)
you are working on a network share and have this mounted on your computer
you know what packages your code depends on
your package dependencies are all on CRAN, and are all available in windows binary form.
If any of these do not apply to you, you'll probably need to read the full vignette. In any case the full vignette contains a bunch more information anyway.
Install the packages using drat
# install.package("drat") # if you don't have it already drat:::add("mrc-ide") install.packages("didehpc")
On windows if you are using a domain machine, you should need only to select the cluster you want to use
options(didehpc.cluster = "fi--didemrchnb")
Otherwise, and on any other platform you'll need to provide your username:
options(didehpc.cluster = "fi--didemrchnb", didehpc.username = "yourusername")
You can see the default configuration with
didehpc::didehpc_config()
If this is the first time you have run this package, best to try out the login procedure with:
didehpc::web_login()
because this exposes a number of problems early on.
Make a vector of packages that you use in your project:
packages <- c("dplyr", "tidyr")
And of files that define functions that you ned to run things:
sources <- "mysources.R"
If you had a vector here that would be OK too.
Then save this together to form a "context".
ctx <- context::context_save("contexts", packages = packages, sources = sources)
If you have no packages or no sources, use NULL
or omit them in
the call below (which is the default anyway).
The first argument here, "contexts"
is the name of a directory
that we will use to hold a lot of information about your jobs. You
don't need (or particularly want) to know what is in here.
This will prompt you for your password, as it will try and log in.
It also installs windows versions of all packages within the
contexts
directory -- both packages required to get this whole
system working and then the packages required for your particular
jobs.
obj <- didehpc::queue_didehpc(ctx)
Once you get to this point we're ready to start running things on the cluster. Let's fire off a test to make sure that everything works OK:
t <- obj$enqueue(sessionInfo())
We can poll the job for a while, which will print a progress bar. If the job is returned in time, it will return the result of running the function. Otherwise it will throw an error.
t$wait(120)
You can use t$result()
to get the result straight away (throwing
an error if it is not ready) or t$wait(Inf)
to wait forever.
t$result()
This is just using the enqueue
function as above. But it also
works with functions defined in files passed in as sources
; here
the function random_walk
.
t <- obj$enqueue(random_walk(0, 10)) res <- t$wait(120) res
The t
object has a number of other methods you can use:
t
Get the result from running a task
t$result()
Get the status of the task
t$status()
(might also be "PENDING", "RUNNING" or "ERROR"
Get the original expression:
t$expr()
Find out how long everything took
t$times()
You may see negative numbers for "waiting" as the submitted time is based on your computer and started/finished are based on the cluster.
And get the log from running the task
t$log()
There is also a bit of DIDE specific logging that happens before this point; if the job fails inexplicably the answer may be in:
obj$dide_log(t)
Want more information? See vignette("didehpc")
for more details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.