waitForJobs: Wait for Termination of Jobs

View source: R/waitForJobs.R

waitForJobsR Documentation

Wait for Termination of Jobs

Description

This function simply waits until all jobs are terminated.

Usage

waitForJobs(
  ids = NULL,
  sleep = NULL,
  timeout = 604800,
  expire.after = NULL,
  stop.on.error = FALSE,
  stop.on.expire = FALSE,
  reg = getDefaultRegistry()
)

Arguments

ids

[data.frame or integer]
A data.frame (or data.table) with a column named “job.id”. Alternatively, you may also pass a vector of integerish job ids. If not set, defaults to the return value of findSubmitted. Invalid ids are ignored.

sleep

[function(i) | numeric(1)]
Parameter to control the duration to sleep between queries. You can pass an absolute numeric value in seconds or a function(i) which returns the number of seconds to sleep in the i-th iteration. If not provided (NULL), tries to read the value (number/function) from the configuration file (stored in reg$sleep) or defaults to a function with exponential backoff between 5 and 120 seconds.

timeout

[numeric(1)]
After waiting timeout seconds, show a message and return FALSE. This argument may be required on some systems where, e.g., expired jobs or jobs on hold are problematic to detect. If you don't want a timeout, set this to Inf. Default is 604800 (one week).

expire.after

[integer(1)]
Jobs count as “expired” if they are not found on the system but have not communicated back their results (or error message). This frequently happens on managed system if the scheduler kills a job because the job has hit the walltime or request more memory than reserved. On the other hand, network file systems often require several seconds for new files to be found, which can lead to false positives in the detection heuristic. waitForJobs treats such jobs as expired after they have not been detected on the system for expire.after iterations. If not provided (NULL), tries to read the value from the configuration file (stored in reg$expire.after), and finally defaults to 3.

stop.on.error

[logical(1)]
Immediately cancel if a job terminates with an error? Default is FALSE.

stop.on.expire

[logical(1)]
Immediately cancel if jobs are detected to be expired? Default is FALSE. Expired jobs will then be ignored for the remainder of waitForJobs().

reg

[Registry]
Registry. If not explicitly passed, uses the default registry (see setDefaultRegistry).

Value

[logical(1)]. Returns TRUE if all jobs terminated successfully and FALSE if either the timeout is reached or at least one job terminated with an exception or expired.


mllg/batchtools documentation built on April 21, 2023, 11:49 a.m.