knitr::opts_chunk$set( collapse = TRUE ) library(parallel)
Install from CRAN
install.packages("mcprogress")
Install from Github
devtools::install_github("myles-lewis/mcprogress")
This package adds a progress bar to mclapply()
using echo
to output to the
console in Rstudio or Linux environments. Simply replace your original call to
mclapply()
with pmclapply()
.
library(mcprogress) # toy example res <- pmclapply(letters[1:20], function(i) { Sys.sleep(0.2 + runif(1) * 0.1) setNames(rnorm(5), paste0(i, 1:5)) }, mc.cores = 2, title = "Working")
Working |================================ | 60% eta 3.1 secs
pmclapply()
can be used in an identical manner to mclapply()
. It is ideal
for use if the length of X
is comparably > cores. As processes are spawned
in a block and most code for each process completes at roughly the same time,
processes move along in blocks as determined by mc.cores
. To track
progress, pmclapply
only tracks the nth process, where n=mc.cores
. For
example, with 4 cores, pmclapply
reports progress when the 4th, 8th, 12th,
16th etc process has completed.
knitr::include_graphics("mcp1.png")
ETA is approximate. As part of minimising overhead, it is only updated with each change in progress (i.e. each time a block of processes completes). It is not updated by interrupt.
However, in some scenarios the length of X
is comparable to the number of
cores and each process may take a long time. For example, machine learning
applied to each of 8 cross-validation folds on an 8-core machine will open 8
processes from the outset. Each process will often complete at roughly the
same time. In this case pmclapply
is much less informative as it only shows
completion at the end of 1 round of processes, so it will go from 0% straight
to 100%.
For this scenario, we recommend users use mcProgressBar()
which
allows more fine-grained reporting of subprogress from within a block of
parallel processes. The diagram below illustrates computation involving 10
processes to complete across 8 cores, with subprogress divided into 5 intervals.
knitr::include_graphics("mcp2.png")
Technically only 1 process can be tracked. If cores
is set to 4 and subval
is invoked, then the 1st, 5th, 9th, 13th etc process is tracked. Subprogress of
this process is computed as part of the number of blocks of processes required.
In the next example, we build a custom function showing how to use
mcProgressBar()
including a call to mclapply
wrapped around another nested
function which can report subprogress.
library(parallel) my_fun <- function(x, cores) { start <- Sys.time() mcProgressBar(0, title = "my_fun") # initialise progress bar res <- mclapply(seq_along(x), function(i) { # inner loop of calculation y <- 1:4 inner <- lapply(seq_along(y), function(j) { Sys.sleep(0.2 + runif(1) * 0.1) mcProgressBar(val = i, len = length(x), cores, subval = j / length(y), title = "my_fun", start = start) rnorm(4) }) inner }, mc.cores = cores) closeProgress(start, title = "my_fun") # finalise the progress bar res } output <- my_fun(letters[1:4], cores = 2)
Alternatively even if the function call inside mclapply
does not have a for
loop or equivalent, then progress can still be reported manually after chunks of
computation.
## Example of long function longfun <- function(x, cores) { start <- Sys.time() mcProgressBar(0, title = "longfun") # initialise progress bar res <- mclapply(seq_along(x), function(i) { # long sequential calculation in parallel with 3 major steps applied to x[i] Sys.sleep(0.5) mcProgressBar(val = i, len = length(x), cores, subval = 0.33, title = "longfun", start = start) # 33% complete Sys.sleep(0.5) mcProgressBar(val = i, len = length(x), cores, subval = 0.66, title = "longfun", start = start) # 66% complete Sys.sleep(0.5) mcProgressBar(val = i, len = length(x), cores, subval = 1, title = "longfun", start = start) # 100% complete return(rnorm(4)) }, mc.cores = cores) closeProgress(start, title = "longfun") # finalise the progress bar res } output <- longfun(letters[1:4], cores = 2)
The mcProgressBar
function can be used with the foreach
package and the
doMC
package multicore backend to show a progress bar.
# Example from doMC vignette library(doMC) library(foreach) registerDoMC(4) x <- iris[which(iris[,5] != "setosa"), c(1,5)] trials <- 10000 { start <- Sys.time() r <- foreach(i = seq_len(trials), .combine = cbind) %dopar% { ind <- sample(100, 100, replace = TRUE) result1 <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit)) mcProgressBar(i, trials, cores = getDoParWorkers(), start = start) coefficients(result1) } closeProgress(start) } # Equivalent using pmclapply r <- pmclapply(seq_len(trials), function(i) { ind <- sample(100, 100, replace = TRUE) result1 <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit)) coefficients(result1) }, mc.cores = 4)
The package also includes functions to safely print messages (including error messages) from within parallelised code. These can be very useful for debugging parallel R code.
res <- mclapply(1:5, function(i) { Sys.sleep(runif(1) /10) message_parallel("Process ", i, " done") rnorm(1) }) ## Process 1 done ## Process 3 done ## Process 2 done ## Process 5 done ## Process 4 done
If errors occur during parallel processing, mclapply
generates a nondescript
warning "all scheduled cores encountered errors in user code". One option is to
set mc.cores = 1
. This will often reveal the error message, but can be slow if
computation is long and the error occurs only half way through.
out <- mclapply(1:5, function(i) { rnorm(-1) }, mc.cores = 2) # change mc.cores = 1 to reveal actual error message ## Warning in mclapply(1:5, function(i) {: all scheduled cores encountered errors ## in user code
The function catchError()
enables an expression to be wrapped in try()
so
that code is executed and if an error message is produced it is printed to the
console to be more visible. If no error is generated the usual of the expression
is returned. This allows you to write your code as usual. It can more easily be
utilised using the pipe |>
. Additional arguments can be provided to track
values so that the programmer can more easily find out when the error occurs.
out <- mclapply(1:5, function(i) { j = 4 + i rnorm(-1) |> catchError(i, j) }, mc.cores = 2) ## Error in rnorm(-1) : invalid arguments ## i=1, j=5 ## Error in rnorm(-1) : invalid arguments ## i=2, j=6
The function mcstop()
allows programmers to generate visible error messages
during parellel code.
res <- mclapply(1:5, function(i) { Sys.sleep(runif(1) /10) if (i == 5) mcstop("My error message") rnorm(1) }) ## My error message
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.