View source: R/select_threshold.R
select_threshold.cluster_pairs | R Documentation |
Select matching pairs with a score above or equal to a threshold
## S3 method for class 'cluster_pairs'
select_threshold(pairs, variable, score, threshold, new_name = NULL, ...)
select_threshold(pairs, variable, score, threshold, ...)
## S3 method for class 'pairs'
select_threshold(pairs, variable, score, threshold, inplace = FALSE, ...)
pairs |
a |
variable |
the name of the new variable to create in pairs. This will be a
logical variable with a value of |
score |
name of the score/weight variable of the pairs. When not given
and |
threshold |
the threshold to apply. Pairs with a score above or equal to the threshold are selected. |
new_name |
name of new object to assign the pairs to on the cluster nodes. |
... |
ignored |
inplace |
logical indicating whether |
Returns the pairs
with the variable given by variable
added. This
is a logical variable indicating which pairs are selected a matches.
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
pairs <- predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE)
# Select pairs with a mpost > 0.5
select_threshold(pairs, "selected", "mpost", 0.5, inplace = TRUE)
# Example using cluster;
# In general the syntax is exactly the same except for the first call to
# to cluster_pair. Note the in general `inplace = TRUE` is implied when
# working with a cluster; therefore the assignment back to pairs can be
# omitted (also not a problem if it is not).
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)
pairs <- cluster_pair(cl, linkexample1, linkexample2)
compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE)
# Select pairs with a mpost > 0.5
# Unlike the regular pairs: inplace = TRUE is implied here
select_threshold(pairs, "selected", "mpost", 0.5)
stopCluster(cl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.