parBatchByIndex | R Documentation |
Takes an index set, breaks it into batches and runs the given function on each batch
in parallel using the given cluster. See batchByIndex
for the non-parallel version.
When doing a select were the condition is a large number of ids it is not always possible to include them in a single SQL statement. This function will break the list of ids into chunks and allow the indexProcessor to deal with just a small number of ids.
parBatchByIndex(allIndices, indexProcessor, reduce, cl, batchSize = 1e+05)
allIndices |
A vector of values that will be broken into batches and passed as an argument to the
|
indexProcessor |
A function that takes one batch if indices. It is called once for each batch, possibly in
parallel. The return value of this function is collected into a list and passed to the
|
reduce |
This function is run after all jobs have finished. It is called with a list of return values from
the The idea is that this function merges all the results together into one result. |
cl |
A SNOW cluster to run jobs on. |
batchSize |
The size of each batch. The last batch may be smaller than this value. |
The return value of the reduce
function is returned.
Kevin Horan
batchByIndex
## Not run:
cl = makeCluster(2) # create a SNOW cluster
#function to run a query for each batch of indexes
job = function(indexBatch)
dbGetQuery(dbConnection, paste("SELECT weight FROM table WHERE id IN (",paste(indexBatch,collapse=","),")"))
# function to combine all the results, in this case by summing them up
reduce = function(results) sum(unlist(results))
indices = 1:10000
#run queries in parallel and then sum the results
totalWeight = parBatchByIndex(indices,job,reduce,cl, 1000)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.