Job: Class for individual SLURM job submission and management

Description Usage Format Details Value Method description Examples

Description

R6 Class that enables easy submission and manipulation of individual shell jobs to a SLURM cluster.

Usage

1
2
3
4
5
6
# x <- Job$new(commandVector, jobName = NULL, outDir = NULL, partition = NULL, time = NULL, mem = NULL, proc = NULL, totalProc = NULL, nodes = NULL, email = NULL)
# x$submit()
# x$wait(stopIfFailed = F, verbose = T)
# x$cancel()
# x$getState(simplify = F)
# x$clean()

Format

R6 class

Details

Submission is achived by creating and executing an sbatch script, for more details on SLURM refer to https://slurm.schedmd.com/ Concatenation is possible for most methods.

Job class - inherits from JobInfo class

Value

R6Class with methods and fields for SLURM job manipulation

Method description

  1. Initialize
    x <- JobInfo$new(commandVector, jobName = NULL, outDir = NULL, partition = NULL, time = NULL, mem = NULL, proc = NULL, totalProc = NULL, nodes = NULL, email = NULL)
    Parameters:

    • commandVector : character vector - Each element should be an independent shell command to be included in the current job. This is the only required argument

    • jobName : character - Name of job, if NULL one will be generated of the form rSubmitter_job_[random_alphanumeric]. Equivalent to --job-name of SLURM sbatch. Most output files use it as a suffix

    • outDir : character - writeable path for the sabtch script as well as the SLRUM STDERR and STDOUT files. If NULL the current working directory will be used

    • partition : character - Partition to use. Equivalent to --partition of SLURM sbatch

    • time : character - Time requested for job execution, one accepted format is "HH:MM:SS". Equivalent to --time of SLURM sbatch

    • mem : character - Memory requested for job execution, one accepted format is "xG" or "xMB". Equivalent to --mem of SLURM sbatch

    • proc : integer - Number of processors requested per task. Equivalent to --cpus-per-task of SLURM sbatch

    • totalProc : integer - Number of tasks requested for job. Equivalent to --ntasks of SLURM sbatch

    • nodes : integer - Number of nodes requested for job. Equivalent to --nodes of SLURM sbatch

    • email : character - email address to send info when job is done. Equivalent to --mail-user= of SLURM sbatch


    Return:
    object of class Job

  2. Submit job(s)
    x$submit()
    Creates sbatch script to outDir and submits it through a system call to sbatch. The script, STDERR and STDOUT sbatch files will be written to outDir.In the case sbatch returns a non-zero status, it will try resubmitting up 12 times with a defined interval time(TIME_WAIT_JOB_STATUS option at ~/.rSubmitter)
    Return:
    self - for method concatenation

  3. Wait for job(s) to finish
    x$wait(stopIfFailed = F, verbose = T)
    Time between each job state check is defined in the entry TIME_WAIT_JOB_STATUS:seconds in the config file located at ~/.rSubmitter
    Parameters:

    • stopIfFailed : logical - if TRUE stops when one job has failed (only useful for JobArray) it then cancels the rest of the pending and running jobs. If FALSE and one or more Jobs failed it raises a warning for each failed job

    • verbose : logical - if TRUE prints the job state(s) at every check


    Return:
    self - for method concatenation

  4. Cancel job(s)
    x$cancel()
    Return:
    self - for method concatenation

  5. Get job(s) state
    x$getState(simplify = F)
    Parameters:

    • simplify : logical - if TRUE returns a freqeuncy data.frame of job states, otherwise returns individual jobs and their associated job names, job ids, and states


    Return:
    data.frame - With SLURM states

  6. Remove SLURM-associated files
    x$clean(script = TRUE, out = TRUE, err = TRUE)
    Parameters:

    • script : logical - if TRUE deletes sbatch submission script(s) associated to this object

    • out : logical - if TRUE deletes STDOUT file(s) from SLURM associated to this object

    • err : logical - if TRUE deletes STDERR file(s) from SLURM associated to this object


    Return:
    self - for method concatenation

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Not run: 
# Create and submit dummy job with random job name
job <- Job$new("echo hola world!")
job$submit()

# Create and submit dummy job with specific job name
job <- Job$new("echo hola world!", jobName = "dummy")
job$submit()

# Create, submit and wait for a job to finish
job <- Job$new(c("echo hola world!", "sleep 60"))
job$submit()
job$wait()

# Method concatenation
job <- Job$new("echo hola world!", jobName = "dummy")
job$submit()$wait()$clean()

# Create, submit and cancel a Job
job <- Job$new(c("echo hola world!", "sleep 60"))
job$submit()
job$cancel()

# Create and submit a memory-heavy job
job <- Job$new("echo this is too much memory!", mem = "16G")
job$submit()

# Create and submit requesting multiple processors
job <- Job$new(c("echo this is multi-processing", "nproc"), proc = 8)
job$submit()

# Many options defined
job <- Job$new(c("echo this are all the options", "nproc"), jobName = "dummy", outDir = "~", partition = "normal", time = "4:00:00", mem = "8G", proc = 8, totalProc = 1, nodes = 1, email = my@email.com)
job$submit()

# Removes script, err and output files
job$clean()

## End(Not run)

pablo-gar/rSubmitter documentation built on Jan. 26, 2020, 2:08 a.m.