slurm_apply: Parallel execution of a function on the Slurm cluster

Description Usage Arguments Details Value See Also Examples

View source: R/slurm_apply.R

Description

Use slurm_apply to compute function over multiple sets of parameters in parallel, spread across multiple nodes of a Slurm cluster.

Usage

1
2
3
slurm_apply(f, params, jobname = NA, nodes = 2, cpus_per_node = 2,
  add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL,
  slurm_options = list(), submit = TRUE)

Arguments

f

A function that accepts one or many single values as parameters and may return any type of R object.

params

A data frame of parameter values to apply f to. Each column corresponds to a parameter of f (Note: names must match) and each row corresponds to a separate function call.

jobname

The name of the Slurm job; if NA, it is assigned a random name of the form "slr####".

nodes

The (maximum) number of cluster nodes to spread the calculation over. slurm_apply automatically divides params in chunks of approximately equal size to send to each node. Less nodes are allocated if the parameter set is too small to use all CPUs on the requested nodes.

cpus_per_node

The number of CPUs per node on the cluster; determines how many processes are run in parallel per node.

add_objects

A character vector containing the name of R objects to be saved in a .RData file and loaded on each cluster node prior to calling f.

pkgs

A character vector containing the names of packages that must be loaded on each cluster node. By default, it includes all packages loaded by the user when slurm_apply is called.

libPaths

A character vector describing the location of additional R library trees to search through, or NULL. The default value of NULL corresponds to libraries returned by .libPaths() on a cluster node. Non-existent library trees are silently ignored.

slurm_options

A named list of options recognized by sbatch; see Details below for more information.

submit

Whether or not to submit the job to the cluster with sbatch; see Details below for more information.

Details

This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job.

The set of input parameters is divided in equal chunks sent to each node, and f is evaluated in parallel within each node using functions from the parallel R package. The names of any other R objects (besides params) that f needs to access should be included in add_objects.

Use slurm_options to set any option recognized by sbatch, e.g. slurm_options = list(time = "1:00:00", share = TRUE). See http://slurm.schedmd.com/sbatch.html for details on possible options. Note that full names must be used (e.g. "time" rather than "t") and that flags (such as "share") must be specified as TRUE. The "array", "job-name", "nodes" and "output" options are already determined by slurm_apply and should not be manually set.

When processing the computation job, the Slurm cluster will output two types of files in the temporary folder: those containing the return values of the function for each subset of parameters ("results_[node_id].RDS") and those containing any console or error output produced by R on each node ("slurm_[node_id].out").

If submit = TRUE, the job is sent to the cluster and a confirmation message (or error) is output to the console. If submit = FALSE, a message indicates the location of the saved data and script files; the job can be submitted manually by running the shell command sbatch submit.sh from that directory.

After sending the job to the Slurm cluster, slurm_apply returns a slurm_job object which can be used to cancel the job, get the job status or output, and delete the temporary files associated with it. See the description of the related functions for more details.

Value

A slurm_job object containing the jobname and the number of nodes effectively used.

See Also

slurm_call to evaluate a single function call.

cancel_slurm, cleanup_files, get_slurm_out and print_job_status which use the output of this function.

Examples

1
2
3
4
5
6
7
## Not run: 
sjob <- slurm_apply(func, pars)
print_job_status(sjob) # Prints console/error output once job is completed.
func_result <- get_slurm_out(sjob, "table") # Loads output data into R.
cleanup_files(sjob)

## End(Not run)

rslurm documentation built on Nov. 17, 2017, 7:48 a.m.