| cluster_pair | R Documentation |
Generates all combinations of records from x and y.
cluster_pair(cluster, x, y, deduplication = FALSE, name = "default")
cluster |
a cluster object as created by |
x |
first |
y |
second |
deduplication |
generate pairs from only |
name |
the name of the resulting object to create locally on the different R processes. |
Generating (all) pairs of the records of two data sets, is usually the first step when linking the two data sets.
x is split into length{cluster} parts which are distributed
over the worker nodes. y is copied to each of the nodes. On the nodes
then pair is called. The pairs are stored in the global
object reclin_env on the nodes in the variable name. The pairs
can then be further processes using functions such as
compare_pairs, and tabulate_patterns. The function
cluster_collect collects the pairs from each of the nodes.
A object of type cluster_pairs which is a list containing the
cluster and the name of the pairs object on the cluster nodes. For the pairs
objects created on the nodes see the documentation of pair.
cluster_pair_blocking and cluster_pair_minsim are
other methods to generate pairs.
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)
pairs <- cluster_pair(cl, linkexample1, linkexample2)
stopCluster(cl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.