| problink_em | R Documentation |
Calculate EM-estimates of m- and u-probabilities
problink_em(
formula,
data,
patterns,
mprobs0 = list(0.95),
uprobs0 = list(0.02),
p0 = 0.05,
tol = 1e-05,
mprob_max = 0.999,
uprob_min = 1e-04
)
formula |
a formula object with the variables for which to calculate the
m- and u-probabilities. Should be of the form |
data |
data set with pairs on which to estimate the model. Alternatively
one can use the |
patterns |
table of patterns (as output by
|
mprobs0, uprobs0 |
initial values of the m- and u-probabilities. These
should be lists with numeric values. The names of the elements in the list
should correspond to the names in |
p0 |
the initial estimate of the probability that a pair is a match. |
tol |
when the change in the m and u-probabilities is smaller than |
mprob_max |
maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities. |
uprob_min |
maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities. |
Returns an object of type problink_em. This is a list containing the
estimated mprobs, uprobs and overall linkage probability
p. It also contains the table of comparison patterns.
Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.2307/2286061")}.
Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
summary(model)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.