Description Usage Arguments Details See Also
View source: R/scheduleDataParallel.R
If you're doing a series of computations over a large data set, then start with this scheduler. This scheduler combines as many chunkable expressions as it can into large blocks of chunkable expressions to run in parallel. The initial data chunks and intermediate objects stay on the workers and do not return to the manager, so you can think of it as "chunk fusion".
1 2 3 4 5 | scheduleDataParallel(graph, data, platform = Platform(),
nWorkers = platform@nWorkers, chunkFuncs = character(),
reduceFuncs = list(), knownReduceFuncs = getKnownReduceFuncs(),
knownChunkFuncs = getKnownChunkFuncs(),
allChunkFuncs = c(knownChunkFuncs, chunkFuncs))
|
graph |
TaskGraph, code dependency graph |
data |
list of data descriptions. Each element is a DataSource. The names of the list elements correspond to the variables in the code that these objects are bound to. |
platform |
Platform describing resource to compute on |
chunkFuncs |
character, names of additional chunkable functions known to the user. |
reduceFuncs |
list of ReduceFun objects, these can override the knownReduceFuncs. |
knownReduceFuncs |
list of known ReduceFun objects |
knownChunkFuncs |
character, the names of chunkable functions from recommended and base packages. |
allchunkFuncs |
character, names of all chunkable functions to use in the analysis. |
It statically balances the load of the data chunks among workers, assuming that loading and processing times are linear in the size of the data.
TODO:
Populate chunkableFuncs
based on code analysis.
Identify which parameters a function is chunkable in, and respect these by matching arguments.
See update_resource.Call
.
Clarify behavior of subexpressions, handling cases such as min(sin(large_object))
makeParallel, schedule
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.