The goal of this article is to provide a minimal example of how to use the
parabar
and
foreach
packages together. The
foreach
package is a popular package that provides syntactic sugar for
executing tasks sequentially (i.e., via the %do%
operator) or in parallel
(i.e., via the %dopar%
operator). In this article, I will provide a brief
introduction to the foreach
package and show how it can be used to run tasks
in parallel with the parabar
package. If you are not yet familiar with the
parabar
package, make sure to check out the
documentation for information on how to
get started.
In a nutshell, the foreach
package provides a way to iterate over a collection
of elements. For iterating over the respective collection sequentially, one can
use the %do%
operator as follows:
# Load the library. library(foreach) # For each element. foreach(i = 1:5) %do% { # Do something. i * 2 } #> [[1]] #> [1] 2 #> #> [[2]] #> [1] 4 #> #> [[3]] #> [1] 6 #> #> [[4]] #> [1] 8 #> #> [[5]] #> [1] 10
In this example, the line
# Load the library. library(foreach)
loads the foreach
package, making all of its functions and operators available
in main session. More interestingly, the call
foreach(i = 1:5)
takes the named argument i = 1:5
provided as input and returns an iterator
object of class foreach
. Then, the %do%
operator is used to execute the
expression on the right-hand side of the operator
{ # Do something. i * 2 }
for each element of the iterator object.
Note. The foreach::foreach
function may take additional arguments that
control the behavior of the iteration process, accumulation of the results, and
the task execution. For example, by default, the foreach::foreach
function
returns the accumulated results as a list. However, the foreach::foreach
can
take a .combine
argument that specifies how the results of each iteration
should be combined into a single object. Specifying, for instance, .combine =
c
for the example above instructs foreach::foreach
that we expect the results
back as a vector instead of a list:
# For each element. foreach(i = 1:5, .combine = c) %do% { # Do something. i * 2 } #> [1] 2 4 6 8 10
Moreover, using the .final
argument, we can provide a function that acts on
the accumulated results right before their are provided back to the user. This
is useful when we want to perform some final operation on the results before
returning them. For example, suppose we want to sum the results of the
iterations. We can do this as follows:
# For each element. foreach(i = 1:5, .combine = c, .final = sum) %do% { # Do something. i * 2 } #> [1] 30
As you may have noticed, the arguments that pertain to the behavior of the
foreach::foreach
function are prepended with a dot. There are more arguments
available. For a complete list, see the documentation for foreach::foreach
and
the vignette Using the foreach
package.
If we want to run a task in parallel, we need to provide a backend that supports
parallelizing the task. Since the foreach
package is not a parallelization
package per se, it does not provide a backend for parallelizing tasks by
default. Instead, it provides a flexible mechanism to register any
parallelization backend with it, as long as that backend supports the %dopar%
operator.
The workflow for running a task in parallel with the foreach
package involves:
foreach
package.%dopar%
operator.While the parabar
package provides
synchronous and
asynchronous
parallelization backends, it does not work out of the box with the foreach
package. This is where the
doParabar
package comes into
play. The doParabar
encapsulated the necessary logic to adapt parabar
backends to work seamlessly with the foreach
package.
At a high level the doParabar
package consists of two main functions:
doPar
:
provides an implementation for the %dopar%
operator (e.g.,
think of it as an adapter that connects the foreach
and parabar
packages).
This function implements the various arguments of the foreach::foreach
function and determines how the tasks are parallelized using a parabar
backend.registerDoParabar
:
registers the doPar
implementation with the foreach
package. This function
sets up the necessary hooks in the foreach
package to use the doPar
implementation for the %dopar%
operator. In other words, it tells foreach
that as long as a parabar
backend is registered, it should use the doPar
implementation in doParabar
for the %dopar%
operator.Note. Two particularly relevant foreach::foreach
arguments in the context of
parallelizing R
code are .export
and .packages
. The .export
argument
specifies the variables that need to be exported to the backend, while the
packages
argument specifies the packages that need to be loaded on the
backend.
doParabar
Unlike other foreach
adapter packages out there (e.g., doParallel
), the the
doParabar
package does not automatically load other packages. Instead, I
recommend to explicitly load the necessary packages in your scripts. In a
similar vein, R
package developers should add the necessary packages to the
Imports
field in the DESCRIPTION
file of their package. Therefore, the first
step in using parabar
with foreach
is to load the necessary packages:
# Load the packages. library(doParabar) library(parabar) library(foreach)
Next, we proceed by using parabar
to create an
asynchronous
parallelization backend that supports progress tracking as follows:
# Create an asynchronous `parabar` backend. backend <- start_backend( cores = 2, cluster_type = "psock", backend_type = "async" )
At this point, we have a parallelization backend that we can register with the
foreach
package. We do this via the registerDoParabar
function:
# Register the backend with the `foreach` package. registerDoParabar(backend)
To verify that the backend has been registered successfully, we can use some of
the function provides by the foreach
package to query information about the
backend:
# Get the parallel backend name. getDoParName() #> [1] "doParabar (AsyncBackend)"
# Check that the parallel backend has been registered. getDoParRegistered() #> [1] TRUE
# Get the current version of backend registration. getDoParVersion() #> [1] "1.0.0"
# Get the number of cores used by the backend. getDoParWorkers() #> [1] 2
Now, we can use the %dopar%
operator to run tasks in parallel. For example:
# Define some variables strangers to the backend. x <- 10 y <- 100 z <- "Not to be exported." # Used the registered backend to run a task in parallel via `foreach`. results <- foreach( i = 1:300, .export = c("x", "y"), .combine = c ) %dopar% { # Sleep a bit to simulate a long-running task. Sys.sleep(0.01) # Compute and return. i + x + y } #> completed 0 out of 300 tasks [ 0%] [ 0s] #> ... #> completed 60 out of 300 tasks [ 20%] [ 1s] #> ... #> completed 300 out of 300 tasks [100%] [ 2s]
# Show a few results. head(results, n = 10) #> [1] 111 112 113 114 115 116 117 118 119 120
tail(results, n = 10) #> [1] 401 402 403 404 405 406 407 408 409 410
Note. The doParabar
package does not automatically export objects (i.e., or
packages for that manner) to the backend. While this break "tradition" with
other foreach
adapter packages, it is a deliberate design choice made to
encourage users to keep their scripts tidy and be mindful of what they export to
the backend. (i.e., see the .export
, .noexport
, and .packages
arguments of
the foreach
function).
We can verify that objects are not automatically exported to the backend by
checking the value of the z
variable on the backend. We expect this call to
throw an error, since z
was never exported to the backend:
# Verify that the variable `z` was not exported. try(evaluate(backend, z)) #> Error : ! in callr subprocess. #> Caused by error in `checkForRemoteErrors(lapply(cl, recvResult))`: #> ! 2 nodes produced errors; first error: object 'z' not found
Finally, we can stop the backend when we are done with as we would normally do:
# Stop the backend. stop_backend(backend)
In this article, I provided a short introduction on how to run tasks in parallel
on parabar
backends using
foreach
semantics. This
integration is possible via the
doParabar
package, which
provides an implementation for the %dopar%
operator (i.e., the doPar
function) and a function to register the implementation with the foreach
package (i.e., the registerDoParabar
function). The source code for the
doParabar
package can be consulted on GitHub
at
github.com/mihaiconstantin/doParabar.
I kindly welcome any feedback or contributions to improving parabar
or
doParabar
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.