View source: R/cluster_modify_pairs.R
cluster_modify_pairs | R Documentation |
Call a function on each of the worker nodes to modify the pairs on the node
cluster_modify_pairs(pairs, fun, ..., new_name = NULL)
pairs |
an object or type |
fun |
a function to call on each of the worker nodes. See details on the arguments of this function. |
... |
additional arguments are passed on to |
new_name |
name of new object to assign the pairs to on the cluster nodes. |
The function will have to accept the following arguments as its first three arguments:
the data.table
with the pairs of the worker node.
a data.table
with the portion of x
present on the
worker node.
a data.table
with y
.
The function should either return a data.table
with the new pairs, or
NULL
. When a data.table
is returned this values will replace
the pairs when new_name
is missing or create new pairs in the
environment new_name
. When the function returns NULL
it is
assumed that the function modified the pairs by reference (e.g. using
pairs[, new_var := new_val]
). Note that this also means that
new_name
is ignored.
Will return a cluster_pairs
object. When new_name
is not given
it will return the input pairs
invisibly. Otherwise it will return a
new cluster_pairs
object.
# Generate some pairs
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)
pairs <- cluster_pair(cl, linkexample1, linkexample2)
compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
# Create a new set of pairs containing a random sample of the original
# pairs.
sample <- cluster_call(pairs, new_name = "sample", function(pairs, ...) {
sel <- sample(nrow(pairs), round(nrow(pairs)*0.1))
pairs[sel, ]
})
# Cleanup
stopCluster(cl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.