In Saint-Louis-University/sluhpc: Submit R Code to the Saint Louis University High Performance Cluster

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  eval = FALSE
)

Embarrassingly parallel calculations are common in R code, and if not actively managed, make for long compute times. The sluhpc package can parallelize calculations similar to how parallel::mclapply() splits repetitious calculations into subtasks and runs them in parallel on a single machine, but instead of running the subtasks on a single machine, the sluhpc package can distribute the work across nodes in the Saint Louis University (SLU) High Performance Cluster (HPC).

The purpose of the sluhpc package is to simplify the steps necessary to distribute parallel calculations across SLU HPC nodes. The main function slurm_apply() automatically divides a given computation over multiple nodes and writes the necessary file structure and scripts to submit a job to the HPC Slurm Workload Manager. The package also contains equally important helper functions to interact with the HPC such as establishing a Secure Shell (SSH) connection, uploading/downloading files via secure copy (SCP), and combining output from disparate nodes.

.Renviron

By default, credentials to connect to the HPC are read from the environment variables APEX.SLU.EDU_USER and APEX.SLU.EDU_PASS using base::Sys.getenv(). These variables are commonly set via a .Renviron file. This approach has the benefit of keeping R code non-interactive and devoid of credentials but has a disadvantage that the credentials are stored in plain text. If a more secure credentialing method is desired, the default arguments should be overridden.

Basic Example

To illustrate a typical sluhpc workflow, we borrow the example provided in the rslurm vignette.

First, we define a function that accepts a pair of mean and standard deviation parameters, generates a million normal deviates, and returns a corresponding pair of maximum likelihood estimates for the parameters.

my_function <- function(parameter_mu, parameter_sd) {
  sample <- rnorm(10^6, parameter_mu, parameter_sd)
  c(sample_mu = mean(sample), sample_sd = sd(sample))
}

Next we create a parameter data frame where each row is a parameter set and each column matches an argument of the function.

my_parameters <- data.frame(parameter_mu = 1:10,
                            parameter_sd = seq(0.1, 1, length.out = 10))

head(my_parameters, 3)

We now pass that function and the parameters data frame to slurm_apply() where we must also specify a job name. We can optionally define the number of cluster nodes to use via the nodes argument as well as the number of CPUs per node via the cpus_per_node argument (both of which default to 2). The cpus_per_node argument is similar to the mc.cores argument of parallel::mclapply() which sets an upper limit on the number of child processes to run simultaneously within a given node. Additional arguments are passed on to rslurm::slurm_apply() via ....

library(sluhpc)

slurm_job <- slurm_apply(my_function, 
                         my_parameters, 
                         "my_apply")

The slurm_apply() function constructs a new directory, which is named the concatenation of "rslurm" and the passed value of the jobname argument, in the current working directory. This new directory contains the file structure and scripts necessary to submit a job to the slurm workload manager. The function also returns an object of type slurm_job that stores some information about the job including job name, job ID, and the number of nodes to be used.

Next we establish a SSH connection to the HPC, use SCP to upload the newly created job directory, and submit the job to slurm.

session <- apex_connect()
slurm_upload(session, slurm_job)
slurm_submit(session, slurm_job)

We can cancel a job if it is taking too long, or we notice a mistake in our setup.

slurm_cancel(session, slurm_job)

The slurm_download() function will block until the job has completed running on the cluster and then download the results via SCP. We can then bind the results from each node together into a data frame object.

slurm_download(session, slurm_job)
results <- slurm_output_dfr(slurm_job)

At this point, we might wish to remove the job files from the cluster and/or the local copy.

slurm_remove_apex(session, slurm_job)
slurm_remove_local(slurm_job)

Finally, we should disconnect our SSH session to the HPC.

apex_disconnect(session)

Saint-Louis-University/sluhpc documentation built on Oct. 30, 2019, 11:47 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Saint-Louis-University/sluhpc
Submit R Code to the Saint Louis University High Performance Cluster

In Saint-Louis-University/sluhpc: Submit R Code to the Saint Louis University High Performance Cluster

.Renviron

Basic Example

R Package Documentation

Browse R Packages

We want your feedback!

Saint-Louis-University/sluhpc Submit R Code to the Saint Louis University High Performance Cluster

In Saint-Louis-University/sluhpc: Submit R Code to the Saint Louis University High Performance Cluster

.Renviron

Basic Example

R Package Documentation

Browse R Packages

We want your feedback!

Saint-Louis-University/sluhpc
Submit R Code to the Saint Louis University High Performance Cluster