test_parallel_computation: Testing parallel computing

View source: R/tools.R

test_parallel_computationR Documentation

Testing parallel computing

Description

This function is aimed at testing if the parallel computation run efficiently and you can use it as a skeleton to benchmark alternative implementations. It simply runs loops that take, in turns, (1:iter)/cost time to run. A large object may be supplied to the argument list to monitor how the memory is being handled. For example, you may want to provided to list the output of the function fit_life_histories.

Usage

test_parallel_computation(
  iter = 20L,
  nb_cores = 1L,
  list = NULL,
  cost = 1,
  lapply_pkg = "pbmcapply"
)

Arguments

iter

the number of iteration to perform

nb_cores

the number of CPU core(s) to use

list

a list which length will be measured (optional)

cost

the cost for the time threshold (default = 1; increase to speed up test, decrease to lengthen it)

lapply_pkg

the R package used to implement a lapply() kind of function (default = "pbmcapply"; other possibilities are "parallel" and "base")

Details

We tried many implementation (using the R packages parallel, furrr, future.apply and foreach; combined with the backends doSnow, doParallel, or doFuture; using either multi-threading or multi-processing). At least on our linux system, the implementation used here simply using mclapply outperformed all these alternatives. Using this function introduces some restrictions: you must run it under a Unix based system and it is best to run it directly in a terminal (as opposed within R-GUI or RStudio). Yet, it does combine keys features suiting our purpose:

  • no time seems wasted doing heavy handed communication between tasks

  • the handling of the memory is best: objects that can be shared between tasks are indeed shared and when a job is done, the memory is immediately released

  • it does not require any additional package.

Yet, since it is a little difficult to display progress properly using mclapply, we used a small wrapper around it provided by pbmclapply. You can alternate between these two implementations by setting the argument lapply_pkg to either "parallel" or "pbmcapply". To use the function sequentially you can also set the argument lapply_pkg to "base". This latter possibility will run fine under Windows, but won't perform parallel computing.

Examples

## sequential version, for reference:
test_parallel_computation(iter = 4L, nb_cores = 2L, lapply_pkg = "base")

## parallel version, using the R package parallel:
test_parallel_computation(iter = 4L, nb_cores = 2L, lapply_pkg = "parallel")

## parallel version, using the R package pbmcapply (if available):
## same with progression bar if pkg pbmcapply installed:
if (requireNamespace("pbmcapply", quietly = TRUE)){
  test_parallel_computation(iter = 4L, nb_cores = 2L)
}


courtiol/twinR documentation built on July 11, 2024, 12:04 a.m.