Description Usage Arguments Details Value
View source: R/calculate_weights.R
Calculate weights for comparison variables based on m and u probabilities estimated from a verified dataset.
1 2 3 4 5 6 7 | calculate_weights(
data,
variables,
compare_type = "stringdist",
suffixes = c("_1", "_2"),
non_negative = FALSE
)
|
data |
data.frame. Verified data. Should have all of the variables you want to calculate weights for from both datasets, named the same with data-specific suffixes. |
variables |
character vector of the variable names of the variables you want to calculate weights for. |
compare_type |
character vector. One of 'stringdist' (for string variables) 'ratio','difference' (for numerics) 'indicator' (0-1 dummy indicating if the two are the same),'in' (0-1 dummy indicating if data1 is IN data2), and 'substr' (numeric indicating how many digits are the same.) |
suffixes |
character vector. Suffixes of of the variables that indicate what data they are from. Default is same as the default for base R merge, c('.x','.y') |
non_negative |
logical. Do you want to allow negative weights? |
This function uses the classic Record Linkage methodology first developed by Felligi and Sunter.
See Record Linkage. m is the
probability of a given link between observations is a true match, while u is the probability
of an unlinked pair of observations being a true match. calculate_weights
computes a preliminary weight for each variable by computing
w = \log_2 (\frac{m}{u}),
then making these weights sum to 1. Thus, the weights that have higher m
and lower u probabilities will get higher weights, which makes sense given
the definitions. These weights can then be easily passed into the score_settings
argument of merge_plus
or tier_match
, or into the wgts
argument of
multivar_match
.
list with m probabilities, u probabilites, w weights, and settings, the list argument requried as an input for score_settings in merge_plus using the calculate weights.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.