##`
## https://github.com/vspinu/polymode/issues/147#issuecomment-399745611
R.v.maj <- as.numeric(R.version$major)
R.v.min.1 <- as.numeric(strsplit(R.version$minor, "\\.")[[1]][1])
if(R.v.maj < 2 || (R.v.maj == 2 && R.v.min.1 < 15))
  stop("requires R >= 2.15", call.=FALSE)

suppressPackageStartupMessages(library(knitr))
opts_chunk$set(echo=TRUE, warning=TRUE, message=TRUE, cache=FALSE, fig.align="center")
opts_knit$set(progress=TRUE, verbose=TRUE)

Overview

This R chunk is used to assess how much time it takes to execute the R code in this document until the end:

t0 <- proc.time()

Since several releases, R natively comes with parallel support:

library(parallel)

Some context and information can be found in the vignette of this package:

#browseVignettes(package="parallel")

This package provides a way to determine how many cores are available:

(nb.cores <- detectCores())

In practice, it is useful to keep at least one core to keep doing other things on the computer while the computations run in parallel:

nb.cores <- nb.cores - 1

Make a function

It is frequent to have a set of instructions that one has to launch several times ($n$):

n <- 4 # small so that this tutorial doesn't take too long to run

When each launch is independent of each other, it is embarassingly easy to parallelize. But before anything else, it is advised to make a function with all instructions. Here is an example for the sake of this tutorial:

unitRun <- function(){
  Sys.sleep(3) # so that this example takes enough time to run
  runif(1)     # so that this examples involves random numbers
}

Sequential code

"for" loop

It is usual to start by writing a "for" loop:

RNGkind(kind="default")
set.seed(1234) # to be reproducible
out.for.1 <- rep(NA, n)
for(i in 1:n){
  out.for.1[i] <- unitRun()
}
out.for.1

Let's do it again to check the reproducibility:

RNGkind(kind="default")
set.seed(1234)
out.for.2 <- rep(NA, n)
for(i in 1:n){
  out.for.2[i] <- unitRun()
}
out.for.2
all.equal(out.for.2, out.for.1)

lapply

After having written a "for" loop, it is advised to re-write the code with a call to "lapply":

RNGkind(kind="default")
set.seed(1234)
system.time(
  out.lapply.1 <- lapply(X=1:n, FUN=function(i){
    unitRun()
}))
out.lapply.1 <- do.call(c, out.lapply.1) # convert the output list into a vector
out.lapply.1
all.equal(out.lapply.1, out.for.1)

Parallel code

mclapply

Once the code uses "lapply", it is trivial to re-write it to use "mclapply", but this function works only on Linux:

RNGkind(kind="L'Ecuyer-CMRG") # good-quality RNG with multiple independent streams
set.seed(1234)
system.time(
  out.mclapply.1 <- mclapply(X=1:n, FUN=function(i){
    unitRun()
}, mc.cores=nb.cores))
out.mclapply.1 <- do.call(c, out.mclapply.1)
out.mclapply.1

Let's do it again to check the reproducibility (only on clusters with the same number of cores):

RNGkind(kind="L'Ecuyer-CMRG")
set.seed(1234)
system.time(
  out.mclapply.2 <- mclapply(X=1:n, FUN=function(i){
    unitRun()
}, mc.cores=nb.cores))
out.mclapply.2 <- do.call(c, out.mclapply.2)
out.mclapply.2
all.equal(out.mclapply.2, out.mclapply.1)

parLapply

On Windows, one first has to create a "cluster" object:

cl <- makeCluster(spec=nb.cores, type="PSOCK")

Then, one can use "parLapply":

RNGkind("L'Ecuyer-CMRG")
clusterSetRNGStream(cl=cl, iseed=1234)
clusterExport(cl=cl, varlist="unitRun")
# if unitRun() requires packages, use clusterEvalQ(cl, library(<pkg>))
system.time(
  out.parLapply.1 <- parLapply(cl=cl, X=1:n, fun=function(i){
    unitRun()
}))
out.parLapply.1 <- do.call(c, out.parLapply.1)
out.parLapply.1

Let's do it again to check the reproducibility (only on clusters with the same number of cores):

RNGkind(kind="L'Ecuyer-CMRG")
clusterSetRNGStream(cl=cl, iseed=1234)
system.time(
  out.parLapply.2 <- parLapply(cl=cl, X=1:n, fun=function(i){
    unitRun()
}))
out.parLapply.2 <- do.call(c, out.parLapply.2)
out.parLapply.2
all.equal(out.parLapply.2, out.parLapply.1)

When it's not needed anymore, don't forget to stop the cluster:

stopCluster(cl)

Miscellaneous

See this course: https://www.datacamp.com/courses/parallel-programming-in-r.

Appendix

t1 <- proc.time(); t1 - t0
print(sessionInfo(), locale=FALSE)


timflutre/rutilstimflutre documentation built on Feb. 7, 2024, 8:17 a.m.