parDosa: Parallel wrapper function to call from within a function

View source: R/parDosa.R

parDosaR Documentation

Parallel wrapper function to call from within a function

Description

parDosa is a wrapper function around many functionalities of the parallel package. It is designed to work closely with MCMC fitting functions, e.g. can easily be called from inside of a function.

Usage

parDosa(cl, seq, fun, cldata,
    lib = NULL, dir = NULL, evalq=NULL,
    size = 1, balancing = c("none", "load", "size", "both"),
    rng.type = c("none", "RNGstream"),
    cleanup = TRUE, unload = FALSE, iseed=NULL, ...)

Arguments

cl

A cluster object created by makeCluster, or an integer. It can also be NULL, see Details.

seq

A vector to split.

fun

A function or character string naming a function.

cldata

A list containing data. This list is then exported to the cluster by clusterExport. It is stored in a hidden environment. Data in cldata can be used by fun.

lib

Character, name of package(s). Optionally packages can be loaded onto the cluster. More than one package can be specified as character vector. Packages already loaded are skipped.

dir

Working directory to use, if NULL working directory is not set on workers (default). Can be a vector to set different directories on workers.

evalq

Character, expressions to evaluate, e.g. for changing global options (passed to clusterEvalQ). More than one expressions can be specified as character vector.

balancing

Character, type of balancing to perform (see Details).

size

Vector of problem sizes (or relative performance information) corresponding to elements of seq (recycled if needed). The default 1 indicates equality of problem sizes.

rng.type

Character, "none" will not set any seeds on the workers, "RNGstream" selects the "L'Ecuyer-CMRG" RNG and then distributes streams to the members of a cluster, optionally setting the seed of the streams by set.seed(iseed) (otherwise they are set from the current seed of the master process: after selecting the L'Ecuyer generator). See clusterSetRNGStream. The logical value !(rng.type == "none") is used for forking (e.g. when cl is integer).

cleanup

logical, if cldata should be removed from the workers after applying fun. If TRUE, effects of dir argument is also cleaned up.

unload

logical, if pkg should be unloaded after applying fun.

iseed

integer or NULL, passed to clusterSetRNGStream to be supplied to set.seed on the workers, or NULL not to set reproducible seeds.

...

Other arguments of fun, that are simple values and not objects. (Arguments passed as objects should be specified in cldata, otherwise those are not exported to the cluster by this function.)

Details

The function uses 'snow' type clusters when cl is a cluster object. The function uses 'multicore' type forking (shared memory) when cl is an integer. The value from getOption("mc.cores") is used if the argument is NULL.

The function sets the random seeds, loads packages lib onto the cluster, sets the working directory as dir, exports cldata and evaluates fun on seq.

No balancing (balancing = "none") means, that the problem is split into roughly equal subsets, without respect to size (see clusterSplit). This splitting is deterministic (reproducible).

Load balancing (balancing = "load") means, that the problem is not splitted into subsets a priori, but subsequent items are placed on the worker which is empty (see clusterApplyLB for load balancing). This splitting is non-deterministic (might not be reproducible).

Size balancing (balancing = "size") means, that the problem is splitted into subsets, with respect to size (see clusterSplitSB and parLapplySB). In size balancing, the problem is re-ordered from largest to smallest, and then subsets are determined by minimizing the total approximate processing time. This splitting is deterministic (reproducible).

Size and load balancing (balancing = "both") means, that the problem is re-ordered from largest to smallest, and then undeterministic load balancing is used (see parLapplySLB). If size is correct, this is identical to size balancing. This splitting is non-deterministic (might not be reproducible).

Value

Usually a list with results returned by the cluster.

Author(s)

Peter Solymos

See Also

Size balancing: parLapplySB, parLapplySLB, mclapplySB

Optimizing the number of workers: clusterSize, plotClusterSize.

parDosa is used internally by parallel dclone functions: jags.parfit, dc.parfit, parJagsModel, parUpdate, parCodaSamples.

parDosa manipulates specific environments described on the help page DcloneEnv.


datacloning/dclone documentation built on Sept. 29, 2024, 3:21 p.m.