options(width = 80) knitr::opts_chunk$set( collapse = TRUE, comment = "# ", fig.path = "man/figures/README-", out.width = "100%" )
Slurm Workload Manager is a popular HPC cluster job scheduler found in many of the top 500 supercomputers. The slurmR
R package provides an R wrapper to it that matches the parallel package's syntax, this is, just like parallel
provides the parLapply
, clusterMap
, parSapply
, etc., slurmR
provides Slurm_lapply
, Slurm_Map
, Slurm_sapply
, etc.
While there are other alternatives such as future.batchtools
, batchtools
, clustermq
, and rslurm
, this R package has the following goals:
It is dependency-free, which means that it works out-of-the-box
Emphasizes been similar to the workflow in the R package parallel
It provides a general framework for creating personalized own wrappers without using template files.
Is specialized on Slurm, meaning more flexibility (no need to modify template files) and debugging tools (e.g., job resubmission).
Provide a backend for the parallel package, providing an out-of-the-box method for creating Socket cluster objects for multi-node operations. (See the examples below on how to use it with other R packages)
Checkout the VS section section for comparing slurmR
with other R packages.
Wondering who is using Slurm? Check out the list at the end of this document.
From your HPC command line, you can install the development version from GitHub with:
$ git clone https://github.com/USCbiostats/slurmR.git $ R CMD INSTALL slurmR/
The second line assumes you have R available in your system (usually loaded via
module R
or some other command). Or using the devtools
from within R:
# install.packages("devtools") devtools::install_github("USCbiostats/slurmR")
citation("slurmR")
For testing purposes, slurmR is available in Dockerhub.
The rcmdcheck
and interactive
images are built on top of
xenonmiddleware/slurm
.
Once you download the files contained in the slurmR
repository,
you can go to the docker
folder and use the Makefile
included
there to start a Unix session with slurmR and Slurm included.
To test slurmR
using docker, check the README.md file located at
https://github.com/USCbiostats/slurmR/tree/master/docker.
library(slurmR) # Suppose that we have 100 vectors of length 50 ~ Unif(0,1) set.seed(881) x <- replicate(100, runif(50), simplify = FALSE)
We can use the function Slurm_lapply
to distribute computations
ans <- Slurm_lapply(x, mean, plan = "none") Slurm_clean(ans) # Cleaning after you
Notice the plan = "none"
option; this tells Slurm_lapply
to only create the job object but do nothing with it, i.e., skip submission. To get more info, we can set the verbose mode on
opts_slurmR$verbose_on() ans <- Slurm_lapply(x, mean, plan = "none") Slurm_clean(ans) # Cleaning after you
The following example was extracted from the package's manual.
# Submitting a simple job job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 20, plan = "submit") # Checking the status of the job (we can simply print) job status(job) # or use the state function sacct(job) # or get more info with the sactt wrapper. # Suppose some of the jobs are taking too long to complete (say 1, 2, and 15 through 20) # we can stop it and resubmit the job as follows: scancel(job) # Resubmitting only sbatch(job, array = "1,2,15-20") # A new jobid will be assigned # Once its done, we can collect all the results at once res <- Slurm_collect(job) # And clean up if we don't need to use it again Slurm_clean(res)
Take a look at the vignette here.
The function makeSlurmCluster
creates a PSOCK cluster within a Slurm HPC network,
meaning that users can go beyond a single node cluster object and take advantage
of Slurm to create a multi-node cluster object. This feature allows using
slurmR
with other R packages that support working with SOCKcluster
class objects. Here are some examples
With the future
package
library(future) library(slurmR) cl <- makeSlurmCluster(50) # It only takes using a cluster plan! plan(cluster, cl) ...your fancy futuristic code... # Slurm Clusters are stopped in the same way any cluster object is stopCluster(cl)
With the doParallel
package
library(doParallel) library(slurmR) cl <- makeSlurmCluster(50) registerDoParallel(cl) m <- matrix(rnorm(9), 3, 3) foreach(i=1:nrow(m), .combine=rbind) stopCluster(cl)
The slurmR
package has a couple of convenient functions designed for the user
to save time. First, the function sourceSlurm()
allows skipping the explicit
creating of a bash script file to be used together with sbatch
by putting all
the required config files on the first lines of an R scripts, for example:
cat("```\n") cat(readLines(system.file("example.R", package="slurmR")), sep="\n") cat("```\n")
Is an R script that on the first line coincides with that of a bash script for
Slurm: #!/bin/bash
. The following lines start with #SBATCH
explicitly
specifying options for sbatch
, and the reminder lines are just R code.
The previous R script is included in the package (type system.file("example.R", package="slurmR")
).
Imagine that that R script is named example.R
, then you use the sourceSlurm
function to submit it to Slurm as follows:
slurmR::sourceSlurm("example.R")
This will create the corresponding bash file required to be used with sbatch
,
and submit it to Slurm.
Another nice tool is the slurmr_cmd()
. This function will create a simple bash-script
that we can use as a command-line tool to submit this type of R-scripts.
Moreover, this command will can add the command to your session's
alias as follows:
library(slurmR) slurmr_cmd("~", add_alias = TRUE)
Once that's done, you can submit R scripts with "Slurm-like headers" (as shown previously) as follows:
$ slurmr example.R
Since version 0.4-3, slurmR
includes the option preamble
. This provides a way
for the user to specify commands/modules that need to be executed before running
the Rscript. Here is an example using module load
:
# Turning the verbose mode off opts_slurmR$verbose_off() # Setting the preamble can be done globally opts_slurmR$set_preamble("module load gcc/6.0") # Or on the fly ans <- Slurm_lapply(1:10, mean, plan = "none", preamble = "module load pandoc") # Printing out the bashfile cat(readLines(ans$bashfile), sep = "\n") Slurm_clean(ans) # Cleaning after you
There are several ways to enhance R for HPC. Depending on what are your goals/restrictions/preferences, you can use any of the following from this manually curated list:
dat <- read.csv("comparing-projects.csv", check.names = FALSE) dat$Dependencies <- sprintf("[](https://CRAN.R-project.org/package=%1$s)", dat$Package) dat$Activity <- sprintf("[](https://github.com/%1$s)", dat$github) dat$Package <- sprintf("[**%s**](https://cran.r-project.org/package=%1$s)", dat$Package) # Packages that only work with Slurm only_w_slurm <- dat$Package[dat$`System [blank]` == "specific"] only_w_slurm <- paste(only_w_slurm, collapse = ", ") dat$github <- NULL dat$`System [blank]` <- NULL dat$`Focus on [blank]` <- NULL knitr::kable(dat)
(1) After errors, a part or the entire job can be resubmitted. (2) Functionality similar to the apply family in base R, e.g., lapply, sapply, mapply or similar. (3) Creating a cluster object using either MPI or Socket connection.
The packages r only_w_slurm
work only on Slurm. The drake package is focused on workflows.
We welcome contributions to slurmR
. Whether it is reporting a bug, starting a discussion by asking a question, or proposing/requesting a new feature, please go by creating a new issue here so that we can talk about it.
Please note that this project is released with a Contributor Code of Conduct (see the CODE_OF_CONDUCT.md file included in this project). By participating in this project, you agree to abide by its terms.
Here is a manually curated list of institutions using Slurm:
|Institution | Country | Link | |------------|---------|------| | University of Utah's CHPC | US | link | | USC Center for Advance Research Computing | US | link | | Princeton Research Computing | US | link | | Harvard FAS | US | link| | Harvard HMS research computing | US | link | | UCSan Diego WM Keck Lab for Integrated Biology | US | link | | Stanford Sherlock | US | link | | Stanford SCG Informatics Cluster | US | link | | UC Berkeley Open Computing Facility | US | link | | University of Utah CHPC | US | link | | The University of Kansas Center for Research Computing | US | link | | University of Cambridge | UK | link | | Indiana University | US | link | | Caltech HPC Center | US | link | | Institute for Advanced Study | US | link | | UTSouthwestern Medical Center BioHPC | US | link | | Vanderbilt University ACCRE | US | link | | University of Virginia Research Computing | US | link | | Center for Advanced Computing | CA | link | | SciNet | CA | link | | NLHPC | CL | link | | Kultrun | CL | link | | Matbio | CL | link | | TIG MIT | US | link | | MIT Supercloud | US | supercloud.mit.edu/ | | Oxford's ARC | UK | link |
With project is supported by the National Cancer Institute, Grant #1P01CA196596.
Computation for the work described in this paper was supported by the University of Southern California's Center for High-Performance Computing (hpcc.usc.edu).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.