BatchtoolsParam-class: Enable parallelization on batch systems

BatchtoolsParam-classR Documentation

Enable parallelization on batch systems

Description

This class is used to parameterize scheduler options on managed high-performance computing clusters using batchtools.

BatchtoolsParam(): Construct a BatchtoolsParam-class object.

batchtoolsWorkers(): Return the default number of workers for each backend.

batchtoolsTemplate(): Return the default template for each backend.

batchtoolsCluster(): Return the default cluster.

batchtoolsRegistryargs(): Create a list of arguments to be used in batchtools' makeRegistry; see registryargs argument.

Usage

BatchtoolsParam(
    workers = batchtoolsWorkers(cluster),
    cluster = batchtoolsCluster(),
    registryargs = batchtoolsRegistryargs(),
    saveregistry = FALSE,
    resources = list(),
    template = batchtoolsTemplate(cluster),
    stop.on.error = TRUE, progressbar = FALSE, RNGseed = NA_integer_,
    timeout = WORKER_TIMEOUT, exportglobals=TRUE,
    log = FALSE, logdir = NA_character_, resultdir=NA_character_,
    jobname = "BPJOB"
)
batchtoolsWorkers(cluster = batchtoolsCluster())
batchtoolsCluster(cluster)
batchtoolsTemplate(cluster)
batchtoolsRegistryargs(...)

Arguments

workers

integer(1)

Number of workers to divide tasks (e.g., elements in the first argument of bplapply) between. On 'multicore' and 'socket' backends, this defaults to multicoreWorkers() and snowWorkers(). On managed (e.g., slurm, SGE) clusters workers has no default, meaning that the number of workers needs to be provided by the user.

cluster

character(1)

Cluster type being used as the backend by BatchtoolsParam. The available options are "socket", "multicore", "interactive", "sge", "slurm", "lsf", "torque" and "openlava". The cluster type if available on the machine registers as the backend. Cluster types which need a template are "sge", "slurm", "lsf", "openlava", and "torque". If the template is not given then a default is selected from the batchtools package.

registryargs

list()

Arguments given to the registry created by BatchtoolsParam to configure the registry and where it's being stored. The registryargs can be specified by the function batchtoolsRegistryargs() which takes the arguments file.dir, work.dir, packages, namespaces, source, load, make.default. It's useful to configure these option, especially the file.dir to a location which is accessible to all the nodes on your job scheduler i.e master and workers. file.dir uses a default setting to make a registry in your working directory.

saveregistry

logical(1)

Option given to store the entire registry for the job(s). This functionality should only be used when debugging. The storage of the entire registry can be time and space expensive on disk. The registry will be saved in the directory specified by file.dir in registryargs; the default locatoin is the current working directory. The saved registry directories will have suffix "-1", "-2" and so on, for each time the BatchtoolsParam is used.

resources

named list()

Arguments passed to the resources argument of batchtools::submitJobs during evaluation of bplapply and similar functions. These name-value pairs are used for substitution into the template file.

template

character(1)

Path to a template for the backend in BatchtoolsParam. It is possible to check which template is being used by the object using the getter bpbackend(BatchtoolsParam()). The template needs to be written specific to each backend. Please check the list of available templates in the batchtools package.

stop.on.error

logical(1)

Stop all jobs as soon as one jobs fails (stop.on.error == TRUE) or wait for all jobs to terminate. Default is TRUE.

progressbar

logical(1)

Suppress the progress bar used in BatchtoolsParam and be less verbose. Default is FALSE.

RNGseed

integer(1)

Set an initial seed for the RNG. Default is NULL where a random seed is chosen upon initialization.

timeout

list()

Time (in seconds) allowed for worker to complete a task. If the computation exceeds timeout an error is thrown with message 'reached elapsed time limit'.

exportglobals

logical(1)

Export base::options() from manager to workers? Default TRUE.

log

logical(1)

Option given to save the logs which are produced by the jobs. If log=TRUE then the logdir option must be specified.

logdir

character(1)

Path to location where logs are stored. The argument log=TRUE is required before using the logdir option.

resultdir

logical(1)

Path where results are stored.

jobname

character(1)

Job name that is prepended to the output log and result files. Default is "BPJOB".

...

name-value pairs

Names and values correspond to arguments from batchtools makeRegistry.

BatchtoolsParam constructor

Return an object with specified values. The object may be saved to disk or reused within a session.

Methods

The following generics are implemented and perform as documented on the corresponding help page: bpworkers, bpnworkers, bpstart, bpstop, bpisup, bpbackend.

bplapply handles arguments X of classes derived from S4Vectors::List specially, coercing to list.

Author(s)

Nitesh Turaga, mailto:nitesh.turaga@roswellpark.org

See Also

getClass("BiocParallelParam") for additional parameter classes.

register for registering parameter classes for use in parallel evaluation.

The batchtools package.

Examples

## Pi approximation
piApprox = function(n) {
    nums = matrix(runif(2 * n), ncol = 2)
    d = sqrt(nums[, 1]^2 + nums[, 2]^2)
    4 * mean(d <= 1)
}

piApprox(1000)

## Calculate piApprox 10 times
param <- BatchtoolsParam()
result <- bplapply(rep(10e5, 10), piApprox, BPPARAM=param)

## Not run: 
## see vignette for additional explanation
library(BiocParallel)
param = BatchtoolsParam(workers=5,
                        cluster="sge",
                        template="script/test-sge-template.tmpl")
## Run parallel job
result = bplapply(rep(10e5, 100), piApprox, BPPARAM=param)

## bpmapply
param = BatchtoolsParam()
result = bpmapply(fun, x = 1:3, y = 1:3, MoreArgs = list(z = 1),
                   SIMPLIFY = TRUE, BPPARAM = param)

## bpvec
param = BatchtoolsParam(workers=2)
result = bpvec(1:10, seq_along, BPPARAM=param)

## bpvectorize
param = BatchtoolsParam(workers=2)
## this returns a function
bpseq_along = bpvectorize(seq_along, BPPARAM=param)
result = bpseq_along(1:10)

## bpiterate
ITER <- function(n=5) {
        i <- 0L
        function() {
            i <<- i + 1L
            if (i > n)
                return(NULL)
        rep(i, n)
        }
    }

param <- BatchtoolsParam()
res <- bpiterate(ITER=ITER(), FUN=function(x,y) sum(x) + y, y=10, BPPARAM=param)

## save logs
logdir <- tempfile()
dir.create(logdir)
param <- BatchtoolsParam(log=TRUE, logdir=logdir)
res <- bplapply(rep(10e5, 10), piApprox, BPPARAM=param)

## save registry (should be used only for debugging)
file.dir <- tempfile()
registryargs <- batchtoolsRegistryargs(file.dir = file.dir)
param <- BatchtoolsParam(saveregistry = TRUE, registryargs = registryargs)
res <- bplapply(rep(10e5, 10), piApprox, BPPARAM=param)
dir(dirname(file.dir), basename(file.dir))

## End(Not run)

Bioconductor/BiocParallel documentation built on Oct. 31, 2024, 6:58 a.m.