Description Usage Arguments Note Examples
Specify control parameters for a MapReduce on a local disk connection. Currently the parameters include:
1 2 | localDiskControl(cluster = NULL, map_buff_size_bytes = 10485760,
reduce_buff_size_bytes = 10485760, map_temp_buff_size_bytes = 10485760)
|
cluster |
a "cluster" object obtained from |
map_buff_size_bytes |
determines how much data should be sent to each map task |
reduce_buff_size_bytes |
determines how much data should be sent to each reduce task |
map_temp_buff_size_bytes |
determines the size of chunks written to disk in between the map and reduce |
If you have data on a shared drive that multiple nodes can access or a high performance shared file system like Lustre, you can run a local disk MapReduce job on multiple nodes by creating a multi-node cluster with makeCluster
.
If you are using multiple cores and the input data is very small, map_buff_size_bytes
needs to be small so that the key-value pairs will be split across cores.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # create a 2-node cluster that can be used to process in parallel
cl <- parallel::makeCluster(2)
# create a local disk control object that specifies to use this cluster
# these operations run in parallel
control <- localDiskControl(cluster = cl)
# note that setting options(defaultLocalDiskControl = control)
# will cause this to be used by default in all local disk operations
# convert in-memory ddf to local-disk ddf
ldPath <- file.path(tempdir(), "by_species")
ldConn <- localDiskConn(ldPath, autoYes = TRUE)
bySpeciesLD <- convert(divide(iris, by = "Species"), ldConn)
# update attributes using parallel cluster
updateAttributes(bySpeciesLD, control = control)
# remove temporary directories
unlink(ldPath, recursive = TRUE)
# shut down the cluster
parallel::stopCluster(cl)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.