JobArray: Class for efficient multi-job SLURM submission and management

Description Usage Format Details Value Method description Examples

Description

R6 Class that enables easy submission and manipulation of SLURM job arrays.

Usage

1
2
3
4
5
6
7
8
# x <- JobArray$new(commandList, jobName = NULL, outDir = NULL, partition = NULL, time = NULL, mem = NULL, proc = NULL, totalProc = NULL, nodes = NULL, email = NULL)
# x$submit()
# x$wait(stopIfFailed = F, verbose = T)
# x$length()
# x$cancel()
# x$getState(simplify = F)
# x$getJobNames()
# x$clean()

Format

R6 class

Details

Job arrays are quickly genearated and submitted allowing for efficient creation and execution of many shell jobs in a SLURM cluster, thus facilitating parallelization when needed. This class eliminates the cumbersome task of manually creating SLURM arrays: in its simplest form two lines of code are sufficient for a job array submmission. Additionally, there is an added functionallity to monitor and wait for all jobs to finished after they have been submitted.

All jobs in an job array share the same execution requirements. Each element in 'commandList' will be submitted as an individual job in the array. Elements of 'commandList' should be vectors of shell commands.

Submission is achived by creating and executing an sbatch script. For more details on SLURM refer to https://slurm.schedmd.com/. For job arrays refer to https://slurm.schedmd.com/job_array.html Concatenation is possible for most methods.

JobArray class - inherits from JobInfo class

Value

R6Class with methods and fields for SLURM job array manipulation

Method description

  1. Initialize
    x <- JobInfo$new(commandList, jobName = NULL, outDir = NULL, partition = NULL, time = NULL, mem = NULL, proc = NULL, totalProc = NULL, nodes = NULL, email = NULL)
    Parameters:

    • commandList : list of character vectors - Each element of the list should be a vector of shell commands. Each element of the list corresponds to a different job in the array.

    • jobName : character - Name of job, if NULL one will be generated of the form rSubmitter_job_[random_alphanumeric]. Equivalent to --job-name of SLURM sbatch. Most output files use it as a suffix

    • outDir : character - writeable path for the sabtch script as well as the SLRUM STDERR and STDOUT files. If NULL the current working directory will be used

    • partition : character - Partition to use. Equivalent to --partition of SLURM sbatch

    • time : character - Time requested for job execution, one accepted format is "HH:MM:SS". Equivalent to --time of SLURM sbatch

    • mem : character - Memory requested for job execution, one accepted format is "xG" or "xMB". Equivalent to --mem of SLURM sbatch

    • proc : integer - Number of processors requested per task. Equivalent to --cpus-per-task of SLURM sbatch

    • totalProc : integer - Number of tasks requested for job. Equivalent to --ntasks of SLURM sbatch

    • nodes : integer - Number of nodes requested for job. Equivalent to --nodes of SLURM sbatch

    • email : character - email address to send info when job is done. Equivalent to --mail-user= of SLURM sbatch


    Return:
    object of class Job

  2. Submit job(s)
    x$submit()
    Creates a job array sbatch script to outDir and submits it through a system call to sbatch. The script, STDERR and STDOUT sbatch files will be written to outDir. In the case sbatch returns a non-zero status, it will try resubmitting up 12 times with a defined interval time(TIME_WAIT_JOB_STATUS option at ~/.rSubmitter). Each element of the array will have its individual STDERR and STDOUT files with the format jobName_[1-Inf].[err|out]. Important options pulled from the config file located at ~/.rSubmitter: maximum number of jobs allowed in the queue (MAX_JOBS_ALLOWED:n); maximum length of a job array (MAX_JOB_ARRAY_LENGTH:n)
    Return:
    self - for method concatenation

  3. Wait for job(s) to finish
    x$wait(stopIfFailed = F, verbose = T)
    The time between each job state check is defined in the entry TIME_WAIT_JOB_STATUS:seconds in the config file located at ~/.rSubmitter
    Parameters:

    • stopIfFailed : logical - if TRUE stops when one job has failed (only useful for JobArray) it then cancels the rest of the pending and running jobs. If FALSE and one or more Jobs failed it raises a warning for each failed job

    • verbose : logical - if TRUE prints the job state(s) at every check


    Return:
    self - for method concatenation

  4. Get length of array
    x$length()
    Return:
    numeric - number of individual jobs in array

  5. Cancel job(s)
    x$cancel()
    Return:
    self - for method concatenation

  6. Get job(s) state
    x$getState(simplify = F)
    Parameters:

    • simplify : logical - if TRUE returns a freqeuncy data.frame of job states, otherwise returns individual jobs and their associated job names, job ids, and states


    Return:
    data.frame - With SLURM states

  7. Get job name(s)
    x$getJobNames()
    Return:
    character vector - With individual job names.

  8. Remove SLURM-associated files
    x$clean(script = TRUE, out = TRUE, err = TRUE)
    Parameters:

    • script : logical - if TRUE deletes sbatch submission script(s) associated to this object

    • out : logical - if TRUE deletes STDOUT file(s) from SLURM associated to this object

    • err : logical - if TRUE deletes STDERR file(s) from SLURM associated to this object


    Return:
    self - for method concatenation

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
# Create and submit 10 dummy jobs
commands <- list()
for(i in 1:10) commands[[i]] <- c("echo adios","sleep 40")
jobArray <- JobArray$new(commands, jobName = "dummy", outDir = "~", mem = "1G", time = "02:00", proc = 1)
jobArray$submit()
jobArray$getState()
jobArray$wait()

# Create and submit 10 dummy jobs, where one fails and the rest of the jobs will be cancelled
commands <- list()
for(i in 1:9) commands[[i]] <- c("echo adios","sleep 40")
commands[[10]] <- "notAcommand"
jobArray <- JobArray$new(commands, jobName = "dummy", outDir = "~", mem = "1G", time = "02:00", proc = 1)
jobArray$submit()
jobArray$getState()
jobArray$wait(stopIfFailed = T)


## End(Not run)

pablo-gar/rSubmitter documentation built on Jan. 26, 2020, 2:08 a.m.