Intro

SLURM job arrays are an efficient way to submit multiple jobs that share the same requriments. They reduce the load on the scheduler by allowing one single submission for multiple jobs.

The JobArray class in rSubmitter implements SLURM job arrays in similar way to individual jobs from the Job class. So all the functionalities of the Job class are available for the JobArray class as well.

This tutorial describes the few particularities of the JobArray class over the Job class. It's highly recommended to first go over the Job tutorial before continuing.

Creating a job array

Similar to Job, a JobArray object is created using the the JobArray$new() constructor method. However compared to Job$new(), the first and only required argument is commandList: a list of command vectors where each vector corresponds to the commands of inidvidual jobs in the array.

For example if you want to create a job array with 3 independent jobs, it would look like this:

library("rSubmitter")
bash_list_cmd <- list(c("echo Hello World", "sleep 30"), # First job
                      c("date", "sleep 20"), # Second job
                      c("ls ~", "du ~", "sleep 40") # Third job
                     )
my_job_array <- JobArray$new(bash_list_cmd)

After submission of this job array (using its $submit() method), three independent and parallel jobs will be executed. The first one prints Hello World to its STDOUT and idles for 30 seconds, the scondone prints the current date and time to a different STDOUT and idles for 20 min, and the last prints the contents and usage of your home folder in yet a different STDOUT.

Job array specifications

All of the job array specifications are declared exactly the same as in a Job by passing them as arguments to JobArray$new(). These include:

STDERR, STDOUT and SLURM batch files

The the standard error, output files, and sbatch script are streamed into files in the specified outDir folder. Only one sbatch script will be created but each individual job in the array will have its own associated STDOUT and STDERR files. These files will have the form of jobName\_[n].[err|out] where n is the number of the corresponding job in the array.

You can change the folder where these files will be generated through the outDir argument of the $new() constructor method.

Submitting a job array

Once you created a Job object you can submit it to SLURM using its method $submit(). This will create a sbatch script and submit it to the queue.

my_job_array$submit()

Monitoring a job array

For a JobArray this achieved mainly identically as a Job

The fist way is manually check the status of the jobs in the arrays by requesting a data.frame with states

my_job_array$submit()
last_state <- my_job_array$getState()

last_state
#        jobId                   jobName jobState
# 1 24347700_3 rSubmitter_job_7305737665  RUNNING
# 2 24347700_1 rSubmitter_job_7305737665  RUNNING
# 3 24347700_2 rSubmitter_job_7305737665  RUNNING

The $wait() method is also available for job arrays. One key difference is that if the stopIfFailed argument is set to TRUE, $wait() will throw and error and cancel all jobs in the array if one or more failed; if it is set to FALSE only a warning will be thrown and it will continue to wait for the rest of jobs to be completed. Moreover if one or more jobs failed it will print out the path to the STDERR and STDOUT of the individual job(s) that failed.

my_job_array$submit()
my_job_array$wait(stopIfFailed = FALSE)

#  2018-08-27 16:55:37 --- Cluster Status |  PENDING = 3 |
#  2018-08-27 16:55:47 --- Cluster Status |  RUNNING = 3 |
#  2018-08-27 16:57:20 --- Cluster Status |  COMPLETED = 2 |  TIMEOUT = 1 |
# Warning message:
# In my_job_array$submit()$wait() : 
# One or more jobs failed. Failed jobs SLURM files:
# /home/users/paedugar/rSubmitter_job_7305737665_24347830_3.[err|out]

Cancelling a job array

Exactly the same as in Job

my_job_array$submit()
my_job_array$cancel()

# 2018-08-27 17:03:58 Cancelling 3 job(s)
# 2018-08-27 17:03:58 Finished sending cancel signal

Cleaning job-associated files

Exactly the same as in Job

my_job_array$submit()
my_job_array$cancel()
my_job_array$clean(script = TRUE, out = TRUE, err = TRUE)

$clean() will throw an error if the job is submitted and it has not completed.



pablo-gar/rSubmitter documentation built on Jan. 26, 2020, 2:08 a.m.