The Rcpp2doParallel
R package provides an example of providing a C++
function and a parallelization call from R to the C++ function using
the doParallel
and
foreach
backend. That said,
any of the do*
backends -- doFuture
,
doMC
,
doMPI
,
doRedis
,
doRNG
,
doSNOW
-- can be substituted
in for the doParallel
backend
used as a driving example here.
To install the package, you must first have a compiler on your system that is compatible with R. For help on obtaining a compiler consult either macOS or Windows guides.
With a compiler in hand, one can then install the package from GitHub by:
# install.packages("devtools")
devtools::install_github("coatless-rd-rcpp/rcpp-and-doparallel")
library("Rcpp2doParallel")
Within this project, there is a C++ function created using Rcpp
that is used
by the doParallel
region within the R package. By packaging the C++ function,
the cost when parallelizing code is decreased as each worker in the
parallelization setup does not have to compile the code locally before being
able to execute it. Moreover, by packing the parallelization code, the deployment
of the algorithm is done using R's package management instead of a monolithic
R script.
.
├── DESCRIPTION # Package metadata
├── LICENSE # Code license
├── NAMESPACE # Function and dependency registration
├── R # R functions
│ ├── Rcpp2doParallel-package.R # Package documentation
│ ├── RcppExports.R # Autogenerated R to C++ bindings by Rcpp
│ └── mean_parallel_compute.R # doParallel cluster formation and C++ call
├── README.md
├── Rcpp2doParallel.Rproj
├── man # Package Documentation
│ ├── Rcpp2doParallel-package.Rd
│ └── mean_parallel_compute.Rd
└── src # Compiled Code
├── RcppExports.cpp # Autogenerated R Bindings
└── mean_rcpp.cpp # Construct a C++ function to comupte mean.
Parallelized R functions require a cluster or set of workers to be setup for the underlying jobs in the parallelization region to be distributed to. The approach taken here self-contains the setup and execution of parallel workers. By encapsulating both options within the function, there is a higher runtime cost on subsequent function calls as the cluster must be setup again. An alternative approach would be to pass an initialized cluster into the function.
When constructing a parallelized region with foreach
, one must:
cl = parallel::startCluster(n_workers)
do*
package using do*::registerDo*()
.doParallel
, this would be doParallel::registerDoParallel(cl)
.foreach() %dopar%
foreach(..., .packages = c("pkgA", "pkgB"), .export = c("var1", "var2"))
Rcpp2doParallel
is loaded on each worker by using
foreach(..., .packages = "Rcpp2doParallel")
parallel::stopCluster(cl)
.on.exit(parallel::stopCluster(cl))
mean_parallel_compute = function(n, mean = 0, sd = 1,
n_sim = 1000,
n_cores = parallel::detectCores()) {
# Construct cluster
cl = parallel::makeCluster(n_cores)
# After the function is run, shutdown the cluster.
on.exit(parallel::stopCluster(cl))
# Register parallel backend
doParallel::registerDoParallel(cl) # Modify with any do*::registerDo*()
# Compute estimates
estimates = foreach::foreach(i = iterators::icount(n_sim), # Perform n simulations
.combine = "rbind", # Combine results
# Self-load
.packages = "Rcpp2doParallel") %dopar% {
random_data = rnorm(n, mean, sd)
result = mean_rcpp(random_data) # or use Rcpp2doParallel::mean_rcpp()
result
}
# Release results
return(estimates)
}
The C++ function must be placed within the package's src/
directory and
exported into R with Rcpp Attributes. Outside of these two requirements,
nothing else must be done as the parallelization is handled by R and not
within the C++ code.
#include <Rcpp.h>
// [[Rcpp::export]]
double mean_rcpp(Rcpp::NumericVector x){
int n = x.size(); // Size of vector
double sum = 0; // Sum value
// For loop, note cpp index shift to 0
for(int i = 0; i < n; i++){
// Shorthand for sum = sum + x[i]
sum += x[i];
}
return sum/n; // Obtain and return the Mean
}
DESCRIPTION
The use of the doParallel
backend has many dependencies that are required
depending on the features you wish to use. In particular, the doParallel
package requires foreach
and parallel
to operate. Only iterators
can
be removed from the dependency list if there is sufficient RAM to allocate
index values, e.g. 1:n
, instead of creating a low cost iterator with n
elements through iterators::icount()
.
LinkingTo:
Rcpp
Imports:
doParallel,
Rcpp,
foreach,
iterators,
parallel
NAMESPACE
As discussed in DESCRIPTION
, the doParallel()
backend has a few dependencies.
The following are functions that must be imported into the package in order
for it to successfully run.
#' @importFrom foreach %dopar% foreach
#' @importFrom iterators icount
#' @importFrom doParallel registerDoParallel
James Joseph Balamuta
GPL (>= 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.