executeMultiProcess: Simultaneous execution of system commands.
In rickhelmus/patRoon: Workflows for Mass-Spectrometry Based Non-Target Analysis

executeMultiProcess

R Documentation

Simultaneous execution of system commands.

Description

Execute a queue of system commands in parallel.

Usage

executeMultiProcess(
  commandQueue,
  finishHandler,
  timeoutHandler = function(...) TRUE,
  errorHandler = defMultiProcErrorHandler,
  prepareHandler = NULL,
  cacheName = NULL,
  setHash = NULL,
  procTimeout = NULL,
  printOutput = FALSE,
  printError = FALSE,
  logSubDir = NULL,
  showProgress = TRUE,
  waitTimeout = 50,
  batchSize = 1,
  delayBetweenProc = 0,
  method = NULL
)

Arguments

`commandQueue`	A list with commands. Should contain `command` (scalar string) and `args` (`character` vector). More user defineds fields are allowed and useful to attach command information that can be used in the finish, timeout and error handlers.
`finishHandler`	A function that is called when a command has finished. This function is typically used to process any results generated by the command. The function is called right after spawning a new process, hence processing results can occur while the next command is running in the background. The function signature should be `function(cmd)` where `cmd` is the queue data (from `commandQueue`) of the command that has finished.
`timeoutHandler`	A function that is called whenever a timeout for a command occurs. Should return `TRUE` if execution of the command should be retried. The function signature should be `function(cmd, retries)` where `cmd` is the queue data for that command and `retries` the number of times the command has been retried.
`errorHandler`	Similar to `timeoutHandler`, but called whenever a command has failed. The signature should be `function(cmd, exitStatus, retries)`. The `exitStatus` argument is the exit code of the command (may be `NA` in rare cases this is unknown). Other arguments are as `timeoutHandler`. The return value should be as `timeoutHandler` or a `character` with an error message which will be thrown with `stop`.
`prepareHandler`	A function that is called prior to execution of the command. The function signature should be `function(cmd)` where `cmd` is the queue data (from `commandQueue`) of the command to be started. The return value must be (an updated) `cmd`.
`cacheName`, `setHash`	Used for caching results. Set to `NULL` to disable caching.
`procTimeout`	The maximum time a process may consume before a timeout occurs (in seconds). Set to `NULL` to disable timeouts. Ignored if `patRoon.MP.method="future"`.
`printOutput`, `printError`	Set to `TRUE` to print stdout/stderr output to the console. Ignored if patRoon.MP.method="future".
`logSubDir`	The sub-directory used for log files. The final log file path is constructed from patRoon.MP.logPath, `logSubDir` and `logFile` set in the `commandQueue`.
`showProgress`	Set to `TRUE` to display a progress bar. Ignored if patRoon.MP.method="future".
`waitTimeout`	Number of milliseconds to wait before checking if a new process should be spawned. Ignored if patRoon.MP.method="future".
`batchSize`	Number of commands that should be executed in sequence per processes. See details. Ignored if patRoon.MP.method="future".
`delayBetweenProc`	Minimum number of milliseconds to wait before spawning a new process. Might be needed to workaround errors. Ignored if patRoon.MP.method="future".
`method`	Overrides patRoon.MP.method if not `NULL`.

Details

This function executes a given queue with system commands in parallel to speed up computation. Commands are executed in the background using the processx package. A configurable maximum amount of processes are created to execute multiple commands in parallel.

Multiple commands may be executed in sequence that are launched from a single parent process (as part of a batch script on Windows or combined with the shell AND operator otherwise). Note that in this scenario still multiple processes are spawned. Each of these processes will manage a chunk of the command queue (size defined by batchSize argument). This approach is typically suitable for fast running commands: the overhead of spawning a new process for each command from R would in this case be significant enough to loose most of the speedup otherwise gained with parallel execution. Note that the actual batch size may be adjusted to ensure that a maximum number of processes are running simultaneously.

Other functionalities of this function include timeout and error handling.

rickhelmus/patRoon documentation built on July 4, 2025, 9:26 p.m.