problink_em | R Documentation |
Calculate EM-estimates of m- and u-probabilities
problink_em(
formula,
data,
patterns,
mprobs0 = list(0.95),
uprobs0 = list(0.02),
p0 = 0.05,
tol = 1e-05,
mprob_max = 0.999,
uprob_min = 1e-04
)
formula |
a formula object with the variables for which to calculate the
m- and u-probabilities. Should be of the form |
data |
data set with pairs on which to estimate the model. Alternatively
one can use the |
patterns |
table of patterns (as output by
|
mprobs0 , uprobs0 |
initial values of the m- and u-probabilities. These
should be lists with numeric values. The names of the elements in the list
should correspond to the names in |
p0 |
the initial estimate of the probability that a pair is a match. |
tol |
when the change in the m and u-probabilities is smaller than |
mprob_max |
maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities. |
uprob_min |
maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities. |
Returns an object of type problink_em
. This is a list containing the
estimated mprobs
, uprobs
and overall linkage probability
p
. It also contains the table of comparison patterns
.
Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.2307/2286061")}.
Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
summary(model)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.