Description Usage Arguments Value Author(s) References See Also Examples
Apply an analytic recombination method to a ddo/ddf object and combine the results
1 2 3 |
data |
an object of class "ddo" of "ddf" |
combine |
the method to combine the results.
See, for example, |
apply |
a function specifying the analytic method to apply to each subset, or a pre-defined apply function (see |
output |
a "kvConnection" object indicating where the output data should reside (see |
overwrite |
logical; should existing output location be overwritten? (also can specify |
params |
a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify) |
packages |
a vector of R package names that contain functions used in |
control |
parameters specifying how the backend should handle things (most-likely parameters to |
verbose |
logical - print messages about what is being done |
Depends on combine
: this could be a distributed data object, a data frame, a key-value list, etc. See examples.
Ryan Hafen
divide
, ddo
, ddf
, drGLM
, drBLB
, combMeanCoef
, combMean
, combCollect
, combRbind
, drLapply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ## in-memory example
##---------------------------------------------------------
# begin with an in-memory ddf (backed by kvMemory)
bySpecies <- divide(iris, by = "Species")
# create a function to calculate the mean for each variable
colMean <- function(x) data.frame(lapply(x, mean))
# apply the transformation
bySpeciesTransformed <- addTransform(bySpecies, colMean)
# recombination with no 'combine' argument and no argument to output
# produces the key-value list produced by 'combCollect()'
recombine(bySpeciesTransformed)
# but we can also preserve the distributed data frame, like this:
recombine(bySpeciesTransformed, combine = combDdf)
# or we can recombine using 'combRbind()' and produce a data frame:
recombine(bySpeciesTransformed, combine = combRbind)
## local disk connection example with parallelization
##---------------------------------------------------------
# create a 2-node cluster that can be used to process in parallel
cl <- parallel::makeCluster(2)
# create the control object we'll pass into local disk datadr operations
control <- localDiskControl(cluster = cl)
# note that setting options(defaultLocalDiskControl = control)
# will cause this to be used by default in all local disk operations
# create local disk connection to hold bySpecies data
ldPath <- file.path(tempdir(), "by_species")
ldConn <- localDiskConn(ldPath, autoYes = TRUE)
# convert in-memory bySpecies to local-disk ddf
bySpeciesLD <- convert(bySpecies, ldConn)
# apply the transformation
bySpeciesTransformed <- addTransform(bySpeciesLD, colMean)
# recombine the data using the transformation
bySpeciesMean <- recombine(bySpeciesTransformed,
combine = combRbind, control = control)
bySpeciesMean
# remove temporary directories
unlink(ldPath, recursive = TRUE)
# shut down the cluster
parallel::stopCluster(cl)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.