autoparallel-task-parallel
In makeParallel: Transform Serial R Code into Parallel R Code

Task Parallel

Parallel infrastructure for computing on data generally centers around data parallelism. This means to call the same function on many different data- Same Instruction Multiple Data (SIMD).

Independent tasks can also run in parallel. This is task parallelism. It means to call different functions on different data simultaneously.

This can be done today through R's included parallel package:

library(parallel)

# Begins asynchronous evaluation of rnorm(10)
job1 = mcparallel(expr = rnorm(10))

# This can happen before the above expression is finished
x = mean(1:10)

y = mccollect(job1)[[1]]

This introduces overhead compared to standard serial evaluation, but it may speed up the program if the following conditions hold:

The system has available computing resources, ie. processor cores which are idle. If an R package uses threads through internal compiled code then introducing parallelism on top of this will generally hurt rather than help, because the processors must now compete for resources. Linear algebra computations with a multithreaded BLAS / LAPACK are a common operation with this effect.
There are two or more relatively long running tasks that can occur simultaneously. For a multicore fork based approach the tasks should take at least 10 ms, and preferably much longer.
At least one task returns a relatively small object. This allows one to avoid the cost of serializing R objects between processes. For example, the code 1:1e8 generates a sequence of 100 million integers. This takes 10 times longer in parallel because the serialization time far exceeds the time for actual computation.

Ideas

Suppose the user would like to run a script multiple times. The software essentially needs to do the following:

run the script once, measuring time required to evaluate each expression, as well as the sizes of the resulting objects to be serialized
infer the dependency structure of the code, which determines where and how statements can run in parallel
solve an optimization problem specifying which statements ideally happen in parallel
rewrite the code to use the optimal strategy