autoparallel-task-parallel

Task Parallel

Parallel infrastructure for computing on data generally centers around data parallelism. This means to call the same function on many different data- Same Instruction Multiple Data (SIMD).

Independent tasks can also run in parallel. This is task parallelism. It means to call different functions on different data simultaneously.

This can be done today through R's included parallel package:

library(parallel)

# Begins asynchronous evaluation of rnorm(10)
job1 = mcparallel(expr = rnorm(10))

# This can happen before the above expression is finished
x = mean(1:10)

y = mccollect(job1)[[1]]

This introduces overhead compared to standard serial evaluation, but it may speed up the program if the following conditions hold:

Ideas

Suppose the user would like to run a script multiple times. The software essentially needs to do the following:

  1. run the script once, measuring time required to evaluate each expression, as well as the sizes of the resulting objects to be serialized
  2. infer the dependency structure of the code, which determines where and how statements can run in parallel
  3. solve an optimization problem specifying which statements ideally happen in parallel
  4. rewrite the code to use the optimal strategy


Try the makeParallel package in your browser

Any scripts or data that you put into this service are public.

makeParallel documentation built on May 2, 2019, 9:40 a.m.