In caravagnalab/easypar: Easy parallaleization of R functions.

The easypar R package allows you to:

run R functions in a parallel fashion in a trivial way.
easily switch between parallel and serial executions of the same calls (at runtime).
save results of paralle computation as far as they are produced (i.e., cache).

easypar can help if you have to run several different independent computations (for instance bootstrap estimates or multiple local-optimisations) and you want these to be parallel on multi-core architectures. easypar interfaces to doParallel in order to make this task easier to code, and to debug.

The idea is to exploit a code template and switch easily between parallel and sequential runs of a function. The code skeleton looks like this

if(parallel)
{
   R = foreach(i = 1:N) %dopar% { ....fun....  }
}
else
{
   for(i in 1:N) { ....fun... }
}

where f is the actual computation.

I want to use parallel = FALSE when I have to debug f, and eventually, I want to use parallel = TRUE to speed up computations. Parallel execution are hard to debug: inside %dopar%, tasks run in different memory spaces, and thus outputs (i.e., print etc) are asynchronous.

This piece of code is at the base of easypar, whose functioning is shown with some examples.

Examples

Consider a dummy function f that sleeps for some random time and then print the output.

f = function(x) 
{
  clock = 2 * runif(1)

  print(paste("Before sleep", x, " - siesta for ", clock))

  Sys.sleep(clock)

  print(paste("After sleep", x))

  return(x)
}

f runs as

f(3)

Input(s). We want to run f on 4 inputs (random univariate numbers). We store them in a list where each position is a full set of parameters that we want to pass to each calls to f (list of lists), named according to the actual parameter names.

inputs = lapply(runif(4), list)
print(inputs)

easypar provides a single function that takes as input f, its list of inputs and some execution parameters for the type of execution requested. The simplest call runs f in parallel, without seeing any output and just receiving the return values in a list as follows

library(easypar)
easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = NULL)

We can control the amount (0 to 1) of cores to use at maximum (which are checked via doPar). Other combinations are also possible.

make each thread dump to a shared rds file its result, implementing a cache which is usefull if one want to real-time analyze output results (with another process).

easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = NULL, cache = "My_task.rds")

# Check
cache = readRDS("My_task.rds")
print(cache)

get outputs to screen (asynchronous per thread) with outfile

easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = '')

sequentially every tasks in a for-loop fashion

easypar::run(FUN = f, PARAMS = inputs, parallel = FALSE, outfile = '')

Runtime control of `easypar`

We can disable parallel executions easily.

We have a global option to force the execution to go serial, whatever its source code default behaviour is (parallel = TRUE will not work).

When f is plugged in a tool and called as

easypar::run(FUN = f, PARAMS = inputs)

which has default parallel = TRUE, and you set the global option easypar.parallel, easypar will run f sequentially.

options(easypar.parallel = FALSE)
easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE)

Errors handling

# Hopefully r will crash at least once but not all calls
f = function(x) 
{
  if(runif(1) > .5) stop("Boom!!")

  "Ok"
}

# Restore parallel and run
options(easypar.parallel = TRUE)
runs = easypar::run(FUN = f, PARAMS = inputs, parallel = TRUE, outfile = NULL)

# inspect and filter function
numErrors(runs)
runs

filterErrors(runs)