The easypar R package allows you to:

easypar can help if you have to run several different independent computations (for instance bootstrap estimates or multiple local-optimisations) and you want these to be parallel on multi-core architectures. easypar interfaces to doParallel in order to make this task easier to code, and to debug.

The idea is to exploit a code template and switch easily between parallel and sequential runs of a function. The code skeleton looks like this

if(parallel)
{
   R = foreach(i = 1:N) %dopar% { ....fun....  }
}
else
{
   for(i in 1:N) { ....fun... }
}

where f is the actual computation.

I want to use parallel = FALSE when I have to debug f, and eventually, I want to use parallel = TRUE to speed up computations. Parallel execution are hard to debug: inside %dopar%, tasks run in different memory spaces, and thus outputs (i.e., print etc) are asynchronous.

This piece of code is at the base of easypar, whose functioning is shown with some examples.

Examples

Consider a dummy function f that sleeps for some random time and then print the output.

f = function(x) 
{
  clock = 2 * runif(1)

  print(paste("Before sleep", x, " - siesta for ", clock))

  Sys.sleep(clock)

  print(paste("After sleep", x))

  return(x)
}

f runs as

f(3)

Input(s). We want to run f on 4 inputs (random univariate numbers). We store them in a list where each position is a full set of parameters that we want to pass to each calls to f (list of lists), named according to the actual parameter names.

inputs = lapply(runif(4), list)
print(inputs)

easypar provides a single function that takes as input f, its list of inputs and some execution parameters for the type of execution requested. The simplest call runs f in parallel, without seeing any output and just receiving the return values in a list as follows

library(easypar)
easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = NULL)

We can control the amount (0 to 1) of cores to use at maximum (which are checked via doPar). Other combinations are also possible.

easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = NULL, cache = "My_task.rds")

# Check
cache = readRDS("My_task.rds")
print(cache)
easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = '')
easypar::run(FUN = f, PARAMS = inputs, parallel = FALSE, outfile = '')

Runtime control of easypar

We can disable parallel executions easily.

We have a global option to force the execution to go serial, whatever its source code default behaviour is (parallel = TRUE will not work).

When f is plugged in a tool and called as

easypar::run(FUN = f, PARAMS = inputs)

which has default parallel = TRUE, and you set the global option easypar.parallel, easypar will run f sequentially.

options(easypar.parallel = FALSE)
easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE)

Errors handling

# Hopefully r will crash at least once but not all calls
f = function(x) 
{
  if(runif(1) > .5) stop("Boom!!")

  "Ok"
}

# Restore parallel and run
options(easypar.parallel = TRUE)
runs = easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = NULL)

# inspect and filter function
numErrors(runs)
runs

filterErrors(runs)


caravagnalab/easypar documentation built on June 10, 2022, 6:05 a.m.