select_threshold: Select matching pairs with a score above or equal to a...

View source: R/select_threshold.R

select_threshold.cluster_pairsR Documentation

Select matching pairs with a score above or equal to a threshold

Description

Select matching pairs with a score above or equal to a threshold

Usage

## S3 method for class 'cluster_pairs'
select_threshold(pairs, variable, score, threshold, new_name = NULL, ...)

select_threshold(pairs, variable, score, threshold, ...)

## S3 method for class 'pairs'
select_threshold(pairs, variable, score, threshold, inplace = FALSE, ...)

Arguments

pairs

a pairs object, such as generated by pair_blocking

variable

the name of the new variable to create in pairs. This will be a logical variable with a value of TRUE for the selected pairs.

score

name of the score/weight variable of the pairs. When not given and attr(pairs, "score") is defined, that is used.

threshold

the threshold to apply. Pairs with a score above or equal to the threshold are selected.

new_name

name of new object to assign the pairs to on the cluster nodes.

...

ignored

inplace

logical indicating whether pairs should be modified in place. When pairs is large this can be more efficient.

Value

Returns the pairs with the variable given by variable added. This is a logical variable indicating which pairs are selected a matches.

Examples

data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
pairs <- predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE)
# Select pairs with a mpost > 0.5
select_threshold(pairs, "selected", "mpost", 0.5, inplace = TRUE)

# Example using cluster;
# In general the syntax is exactly the same except for the first call to 
# to cluster_pair. Note the in general `inplace = TRUE` is implied when
# working with a cluster; therefore the assignment back to pairs can be 
# omitted (also not a problem if it is not).
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)

pairs <- cluster_pair(cl, linkexample1, linkexample2)
compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE)
# Select pairs with a mpost > 0.5
# Unlike the regular pairs: inplace = TRUE is implied here
select_threshold(pairs, "selected", "mpost", 0.5)
stopCluster(cl)


reclin2 documentation built on May 29, 2024, 4:21 a.m.