submitJobs: Submit jobs or chunks of jobs to batch system via cluster...

View source: R/submitJobs.R

submitJobsR Documentation

Submit jobs or chunks of jobs to batch system via cluster function.

Description

If the internal submit cluster function completes successfully, the retries counter is set back to 0 and the next job or chunk is submitted. If the internal submit cluster function returns a fatal error, the submit process is completely stopped and an exception is thrown. If the internal submit cluster function returns a temporary error, the submit process waits for a certain time, which is determined by calling the user-defined wait-function with the current retries counter, the counter is increased by 1 and the same job is submitted again. If max.retries is reached the function simply terminates.

Potential temporary submit warnings and errors are logged inside your file directory in the file “submit.log”. To keep track you can use tail -f [file.dir]/submit.log in another terminal.

Usage

submitJobs(
  reg,
  ids,
  resources = list(),
  wait,
  max.retries = 10L,
  chunks.as.arrayjobs = FALSE,
  job.delay = FALSE,
  progressbar = TRUE
)

Arguments

reg

[Registry]
Registry.

ids

[integer]
Vector for job id or list of vectors of chunked job ids. Only corresponding jobs are submitted. Chunked jobs will get executed sequentially as a single job for the scheduler. Default is all jobs which were not yet submitted to the batch system.

resources

[list]
Required resources for all batch jobs. The elements of this list (e.g. something like “walltime” or “nodes” are defined by your template job file. Defaults can be specified in your config file. Default is empty list.

wait

[function(retries)]
Function that defines how many seconds should be waited in case of a temporary error. Default is exponential back-off with 10*2^retries.

max.retries

[integer(1)]
Number of times to submit one job again in case of a temporary error (like filled queues). Each time wait is called to wait a certain number of seconds. Default is 10 times.

chunks.as.arrayjobs

[logical(1)]
If ids are passed as a list of chunked job ids, execute jobs in a chunk as array jobs. Note that your scheduler and your template must be adjusted to use this option. Default is FALSE.

job.delay

[function(n, i) or logical(1)]
Function that defines how many seconds a job should be delayed before it starts. This is an expert option and only necessary to change when you want submit extremely many jobs. We then delay the jobs a bit to write the submit messages as early as possible to avoid writer starvation. n is the number of jobs and i the number of the ith job. The default function used with job.delay set to TRUE is no delay for 100 jobs or less and otherwise runif(1, 0.1*n, 0.2*n). If set to FALSE (the default) delaying jobs is disabled.

progressbar

[logical(1)]
Set to FALSE to disable the progress bar. To disable all progress bars, see makeProgressBar.

Value

[integer]. Vector of submitted job ids.

Examples

reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123)
f = function(x) x^2
batchMap(reg, f, 1:10)
submitJobs(reg)
waitForJobs(reg)

# Submit the 10 jobs again, now randomized into 2 chunks:
chunked = chunk(getJobIds(reg), n.chunks = 2, shuffle = TRUE)
submitJobs(reg, chunked)

BatchJobs documentation built on March 21, 2022, 5:05 p.m.