cluster_modify_pairs: Call a function on each of the worker nodes to modify the...

View source: R/cluster_modify_pairs.R

cluster_modify_pairsR Documentation

Call a function on each of the worker nodes to modify the pairs on the node

Description

Call a function on each of the worker nodes to modify the pairs on the node

Usage

cluster_modify_pairs(pairs, fun, ..., new_name = NULL)

Arguments

pairs

an object or type cluster_pairs as created for example by cluster_pair.

fun

a function to call on each of the worker nodes. See details on the arguments of this function.

...

additional arguments are passed on to fun.

new_name

name of new object to assign the pairs to on the cluster nodes.

Details

The function will have to accept the following arguments as its first three arguments:

pairs

the data.table with the pairs of the worker node.

x

a data.table with the portion of x present on the worker node.

y

a data.table with y.

The function should either return a data.table with the new pairs, or NULL. When a data.table is returned this values will replace the pairs when new_name is missing or create new pairs in the environment new_name. When the function returns NULL it is assumed that the function modified the pairs by reference (e.g. using pairs[, new_var := new_val]). Note that this also means that new_name is ignored.

Value

Will return a cluster_pairs object. When new_name is not given it will return the input pairs invisibly. Otherwise it will return a new cluster_pairs object.

Examples

# Generate some pairs
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)

pairs <- cluster_pair(cl, linkexample1, linkexample2)
compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))

# Create a new set of pairs containing a random sample of the original
# pairs.
sample <-  cluster_call(pairs, new_name = "sample", function(pairs, ...) {
  sel <- sample(nrow(pairs), round(nrow(pairs)*0.1))
  pairs[sel, ]
})

# Cleanup
stopCluster(cl)


reclin2 documentation built on May 29, 2024, 4:21 a.m.