Parallel Batch By Index
Takes an index set, breaks it into batches and runs the given function on each batch
in parallel using the given cluster. See
batchByIndex for the non-parallel version.
When doing a select were the condition is a large number of ids it is not always possible to include them in a single SQL statement. This function will break the list of ids into chunks and allow the indexProcessor to deal with just a small number of ids.
parBatchByIndex(allIndices, indexProcessor, reduce, cl, batchSize = 1e+05)
A vector of values that will be broken into batches and passed as an argument to the
A function that takes one batch if indices. It is called once for each batch, possibly in
parallel. The return value of this function is collected into a list and passed to the
This function is run after all jobs have finished. It is called with a list of return values from
The idea is that this function merges all the results together into one result.
A SNOW cluster to run jobs on.
The size of each batch. The last batch may be smaller than this value.
The return value of the
reduce function is returned.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## Not run: cl = makeCluster(2) # create a SNOW cluster #function to run a query for each batch of indexes job = function(indexBatch) dbGetQuery(dbConnection, paste("SELECT weight FROM table WHERE id IN (",paste(indexBatch,collapse=","),")")) # function to combine all the results, in this case by summing them up reduce = function(results) sum(unlist(results)) indices = 1:10000 #run queries in parallel and then sum the results totalWeight = parBatchByIndex(indices,job,reduce,cl, 1000) ## End(Not run)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.