R, Rcpp and Parallel Computing

Intro

One View on Parallel Computing

The whole "let's parallelize" thing is a huge waste of everybody's time. There's this huge body of "knowledge" that parallel is somehow more efficient, and that whole huge body is pure and utter garbage. Big caches are efficient. Parallel stupid small cores without caches are horrible unless you have a very specific load that is hugely regular (ie graphics).

[...]

Give it up. The whole "parallel computing is the future" is a bunch of crock.

Linus Torvalds, Dec 2014

Another View on Big Data

\framesubtitle{Imagine a \texttt{gsub("DBMs", "", tweet)} to complement further...}

\centering{\includegraphics[width=\textwidth,height=0.8\textheight,keepaspectratio]{images/big-data-big-machine-tweet.png}}

R

CRAN Task View on HPC

\framesubtitle{\texttt{http://cran.r-project.org/web/views/HighPerformanceComputing.html}}

Things R does well:

\medskip

Rcpp

Rcpp: Early Days

In the fairly early days of Rcpp, we also put out RInside as a simple C++ class wrapper around the R-embedding API.

It got one clever patch taking this (ie: R wrapped in C++ with its own main() function) and encapsulating it within MPI.

HP Vertica also uses Rcpp and RInside in DistributedR.

Rcpp: More recently

Rcpp is now easy to deploy; Rcpp Attributes played a key role:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double piSugar(const int N) {
    NumericVector x = runif(N);
    NumericVector y = runif(N);
    NumericVector d = sqrt(x*x + y*y);
    return 4.0 * sum(d < 1.0) / N;
}

Rcpp: Extensions

Rcpp Attributes also support "plugins"

OpenMP is easy to use and widely supported (on suitable OS / compiler combinations).

So we added support via a plugin. Use is still not as wide-spread.

Errors have commonality: calling back into R.

RcppParallel

Parallel Programming for Rcpp Users

\framesubtitle{NOT like this...}

using namespace boost;

void task()
{
   lock_guard<boost::mutex> lock(mutex);
   // etc...
}

threadpool::pool tp(thread::hardware_concurrency());
for (int i=0; i<slices; i++)
   tp.schedule(&task); 

Parallel Programming for Rcpp Users

Goals:

Parallel Programming Alternatives

\footnotesize

| | TBB | OMP | RAW | |---|:----------:|:------:|:-------:| Task level parallelism | \textbullet | \textbullet | | Data decomposition support | \textbullet | \textbullet | | Non loop parallel patterns | \textbullet | | | Generic parallel patterns | \textbullet | | | Nested parallelism support | \textbullet | | | Built in load balancing | \textbullet | \textbullet | | Affinity support | | \textbullet | \textbullet | Static scheduling | | \textbullet | | Concurrent data structures | \textbullet | | | Scalable memory allocator | \textbullet | | |

TBB vs. OpenMP vs. Threads

Win32 Platform Complications

R Concurrency Complications

R is single-threaded and includes this warning in Writing R Extensions when discussing the use of OpenMP:

Calling any of the R API from threaded code is ‘for experts only’: they will need to read the source code to determine if it is thread-safe. In particular, code which makes use of the stack-checking mechanism must not be called from threaded code.

However we don't really want to force Rcpp users to resort to reading the Rcpp and R source code to assess thread safety issues.

RcppParallel Threadsafe Accessors

Since R vectors and matrices are just raw contiguous arrays it's easy to create threadsafe C++ wrappers for them:

The implementions of these classes are extremely lightweight and never call into Rcpp or the R API (so are always threadsafe).

RcppParallel Operations

Two high-level operations are provided (with TBB and TinyThread implementations of each):

Not surprisingly the TBB versions of these operations perform ~ 50% better than the "naive" parallel implementation provided by TinyThread.

Basic Mechanics: Create a Worker

Create a Worker class with operator() that RcppParallel uses to operate on discrete slices of the input data on different threads:

class MyWorker : public RcppParallel::Worker {

   void operator()(size_t begin, size_t end) {
      // do some work from begin to end 
      // within the input data
   }

}

Basic Mechanics: Call the Worker

Worker would typically take input and output data in it's constructor then save them as members (for reading/writing within operator()):

NumericMatrix matrixSqrt(NumericMatrix x) {

  NumericMatrix output(x.nrow(), x.ncol());

  SquareRootWorker worker(x, output);

  parallelFor(0, x.length(), worker);

  return output;
}

Basic Mechanics: Join Function

For parallelReduce you need to specify how data is to be combined. Typically you save data in a member within operator() then fuse it with another Worker instance in the join function.

class SumWorker : public RcppParallel::Worker

   // join my value with that of another SumWorker
   void join(const SumWorker& rhs) { 
      value += rhs.value; 
   }
}

What does all of this buy us?

Examples

Example: Transforming a Matrix in Parallel

\framesubtitle{\texttt{http://gallery.rcpp.org/articles/parallel-matrix-transform}}

 void operator()(size_t begin, size_t end) {
      std::transform(input.begin() + begin, 
                     input.begin() + end, 
                     output.begin() + begin, 
                     ::sqrt);
   }
                   test replications elapsed relative
2 parallelMatrixSqrt(m)          100   0.294    1.000
1         matrixSqrt(m)          100   0.755    2.568

Example: Summing a Vector in Parallel

\framesubtitle{\texttt{http://gallery.rcpp.org/articles/parallel-vector-sum}}

void operator()(size_t begin, size_t end) {
   value += std::accumulate(input.begin() + begin, 
                            input.begin() + end, 
                            0.0);
}    
void join(const Sum& rhs) { 
   value += rhs.value; 
}
                  test replications elapsed relative
2 parallelVectorSum(v)          100   0.182    1.000
1         vectorSum(v)          100   0.857    4.709

Example: Parallel Distance Matrix Calculation

\framesubtitle{\texttt{http://gallery.rcpp.org/articles/parallel-distance-matrix}}

                       test reps elapsed relative
3 rcpp_parallel_distance(m)    3   0.110    1.000
2          rcpp_distance(m)    3   0.618    5.618
1               distance(m)    3  35.560  323.273

The Rest of TBB

Open Issues



Try the RcppParallel package in your browser

Any scripts or data that you put into this service are public.

RcppParallel documentation built on March 7, 2023, 7:05 p.m.