View source: R/select_unique.R
select_unique.cluster_pairs | R Documentation |
Deselect pairs that are linked to multiple records
## S3 method for class 'cluster_pairs'
select_unique(
pairs,
variable,
preselect = NULL,
n = 1,
m = 1,
id_x = NULL,
id_y = NULL,
...
)
select_unique(
pairs,
variable,
preselect = NULL,
n = 1,
m = 1,
id_x = NULL,
id_y = NULL,
...
)
## S3 method for class 'pairs'
select_unique(
pairs,
variable,
preselect = NULL,
n = 1,
m = 1,
id_x = NULL,
id_y = NULL,
x = attr(pairs, "x"),
y = attr(pairs, "y"),
inplace = FALSE,
...
)
pairs |
a |
variable |
the name of the new variable to create in pairs. This will be a
logical variable with a value of |
preselect |
a logical variable with the same length as |
n |
do not select pairs with a y-record that is linked to more than
|
m |
do not select pairs with a m-record that is linked to more than
|
id_x |
a integer vector with the same length as the number of rows in
|
id_y |
a integer vector with the same length as the number of rows in
|
... |
Used to pass additional arguments to methods |
x |
|
y |
|
inplace |
logical indicating whether |
This function can be used to remove pairs for which there is ambiguity. For
example when a record from x
is linked to multiple records from
y
and we know that there are no duplicate records in y
(records
that belong to the same object), then we know that at least on of the two
links is incorrect but we cannot decide which of the two. In that case we may
want to decide that we will not link both records. Running
select_unique
with m == 1
will remove both records.
In case one wants to select one of the records randomly: select_greedy
will select the pair with the highest weight and in case of an equal weight
the first. Adding a random component to the weights will ensure a random
selection.
Returns the pairs
with the variable given by variable
added. This
is a logical variable indicating which pairs are selected as matches.
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
compare_pairs(pairs, on = c("lastname", "firstname", "address", "sex"),
default_comparator = jaro_winkler(0.9), inplace = TRUE)
score_simple(pairs, "score",
on = c("lastname", "firstname", "address", "sex"),
w1 = list(lastname = 2), inplace = TRUE)
select_threshold(pairs, variable = "select",
score = "score", threshold = 4.0, inplace = TRUE)
select_unique(pairs, variable = "select_unique", preselect = "select")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.