dopar: Interface for parallel computations

doparR Documentation

Interface for parallel computations

Description

dopar and combinepar are interfaces primarily designed to apply some function fn in parallel on columns of a matrix, although other uses are possible. Depending on the nb_cores argument, parallel or serial computation is performed. A socket cluster is used by default for parallel computations, but a fork cluster can be requested on linux and alike operating systems by using argument cluster_args=list(type="FORK").

dopar has been designed to provide by default a progress bar in all evaluations contexts. A drawback is that different procedures are called depending e.g. on the type of cluster, with different possible controls. In particular, foreach is called in some cases but not others, so non-trivial values of its .combine control are not always enforced. The alternative interface combinepar will always use foreach, and will still try to provide by default a progress bar but may fail to do so in some cases (see Details).

Usage

dopar(newresp, fn, nb_cores = NULL, fit_env, 
      control = list(.final=function(v) if( ! is.list(v[[1]])) {do.call(cbind,v)} else v), 
      cluster_args = NULL, debug. = FALSE, iseed = NULL, 
      showpbar = eval(spaMM.getOption("barstyle")), 
      pretest_cores =NULL, ...)
combinepar(newresp, fn, nb_cores = NULL, cluster=NULL, fit_env, 
      control = list(.final=function(v) if( ! is.list(v[[1]])) {do.call(cbind,v)} else v), 
      cluster_args = NULL, debug. = FALSE, iseed = NULL, 
      showpbar = eval(spaMM.getOption("barstyle")), 
      pretest_cores =NULL, ...)

Arguments

newresp

A matrix on whose columns fn will be applied (e.g., as used internally in spaMM, the return value of a simulate.HLfit() call); or an integer, then converted to a trivial matrix matrix(seq(newresp),ncol=newresp,nrow=1).

fn

Function whose first argument is named y. The function will be applied for y taken to be each column of newresp.

nb_cores

Integer. Number of cores to use for parallel computations. If >1 (and no cluster is provided by the cluster argument), a cluster of nb_cores nodes is created, used, and stopped on completion of the computation. Otherwise, no parallel computation is performed.

cluster

(for combinepar only): a cluster object (as returned by parallel::makeCluster or parallel::makeForkCluster). If this is used, the nb_cores and cluster_args arguments are ignored. The cluster is not stopped on completion of the computation

fit_env

(for socket clusters only:) An environment, or a list, containing variables to be exported on the nodes of the cluster (by parallel::clusterExport); e.g., list(bar=bar) to pass object bar to each node. The argument control(.errorhandling = "pass"), below, is useful to find out missing variables.

control

A list following the foreach control syntax, even if foreach is not used. There are limitations when dopar (but not combinepar) is used, in all but the first case below:

  1. for socket clusters, with doSNOW attached, foreach is called with default arguments including i = 1:ncol(newresp), .inorder = TRUE, .errorhandling = "remove", .packages = "spaMM", and further arguments taken from the present function's control argument, which may also be used to override the defaults. For example, .errorhandling = "pass" is useful to get error messages from the nodes, and therefore strongly recommended when first experimenting with this function.

  2. for socket clusters, with doSNOW not attached, dopar calls pbapply instead of foreach but control$.packages is still handled. The result is still in the format returned in the first case, i.e. by foreach, taking the control argument into account. pbapply arguments may be passed through the ... argument.

  3. if a fork cluster is used, dopar calls mclapply instead of foreach. control$mc.silent can be used to control the mc.silent argument of mclapply.

  4. (if nb_cores=1 dopar calls mclapply).

cluster_args

A list of arguments passed to parallel::makeCluster. E.g., outfile="log.txt" may be useful to collect output from the nodes, and type="FORK" to force a fork cluster on linux(-alikes).

debug.

(for socket clusters only:) For debugging purposes. Effect, if any, is to be defined by the fn as provided by the user.

iseed

(all parallel contexts:) Integer, or NULL. If an integer, it is used as the iseed argument of clusterSetRNGStream to initialize "L'Ecuyer-CMRG" random-number generator (see Details). If iseed is NULL, the default generator is selected on each node, where its seed is not controlled.

showpbar

(for socket clusters only:) Controls display of progress bar. See barstyle option for details.

pretest_cores

(for socket clusters only:) A function to run on the cores before running fn. It may be used to check that all arguments of the fn can be evaluated in the cores' environments (the internal function .pretest_fn_on_cores provides an example).

...

Further arguments to be passed (unevaluated) to fn, if not caught on the way by pbapply (which means that different results may in principle be obtained depending on the mode of parallelisation, which is the kind of design issues that combinepar aims to resolve by always calling foreach).

Details

Control of random numbers through the "L'Ecuyer-CMRG" generator and the iseed argument is not sufficient for consistent results when the doSNOW parallel backend is used, so if you really need such control in a fn using random numbers, do not use doSNOW. Yet, it is fine to use doSNOW for bootstrap procedures in spaMM, because the fitting functions do not use random numbers: only sample simulation uses them, and it is not performed in parallel.

combinepar calls foreach::%dopar% which assumes that a cluster has been declared using a suitable backend such as doSNOW, doFuture or doParallel. If only the latter is available, no progress bar is displayed. A method to render a bar when doParallel is used can be found on the Web, but that bar is not a valid progress bar as it is displayed only after all the processes have been run.

Value

The result of calling foreach, pbapply or mclapply, as dependent on the control argument and the interface used. A side-effect of either interface is to show a progress bar whose character informs about the type of parallelisation performed: a "F" or default "=" character for fork clusters, a "P" for parallel computation via foreach and doSNOW, a "p" for parallel computation via foreach and doFuture or via pbapply, and "s" for serial computation foreach and doParallel or via pbapply.

See Also

dofuture is yet another interface with (essentially) the same functionalities as dopar. See the documentation of the wrap_parallel option for its differences from dopar.

Examples

## See source code of spaMM_boot()

## Not run: 
# Useless function, but requiring some argument beyond the first
foo <- function(y, somearg, ...) {
  if ( is.null(somearg) || TRUE ) length(y)
}

# Whether FORK can be used depends on OS and whether Rstudio is used:
dopar(matrix(1,ncol=4,nrow=3), foo, fit_env=list(), somearg=NULL, 
  nb_cores=2, cluster_args=list(type="FORK"))

## End(Not run)



spaMM documentation built on June 22, 2024, 9:48 a.m.

Related to dopar in spaMM...