View source: R/select_threshold.R
| select_threshold.cluster_pairs | R Documentation | 
Select matching pairs with a score above or equal to a threshold
## S3 method for class 'cluster_pairs'
select_threshold(pairs, variable, score, threshold, new_name = NULL, ...)
select_threshold(pairs, variable, score, threshold, ...)
## S3 method for class 'pairs'
select_threshold(pairs, variable, score, threshold, inplace = FALSE, ...)
pairs | 
 a   | 
variable | 
 the name of the new variable to create in pairs. This will be a
logical variable with a value of   | 
score | 
 name of the score/weight variable of the pairs. When not given
and   | 
threshold | 
 the threshold to apply. Pairs with a score above or equal to the threshold are selected.  | 
new_name | 
 name of new object to assign the pairs to on the cluster nodes.  | 
... | 
 ignored  | 
inplace | 
 logical indicating whether   | 
Returns the pairs with the variable given by variable added. This
is a logical variable indicating which pairs are selected a matches.
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
pairs <- predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE)
# Select pairs with a mpost > 0.5
select_threshold(pairs, "selected", "mpost", 0.5, inplace = TRUE)
# Example using cluster;
# In general the syntax is exactly the same except for the first call to 
# to cluster_pair. Note the in general `inplace = TRUE` is implied when
# working with a cluster; therefore the assignment back to pairs can be 
# omitted (also not a problem if it is not).
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)
pairs <- cluster_pair(cl, linkexample1, linkexample2)
compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE)
# Select pairs with a mpost > 0.5
# Unlike the regular pairs: inplace = TRUE is implied here
select_threshold(pairs, "selected", "mpost", 0.5)
stopCluster(cl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.