problink_em: Calculate EM-estimates of m- and u-probabilities

View source: R/problink_em.R

problink_emR Documentation

Calculate EM-estimates of m- and u-probabilities

Description

Calculate EM-estimates of m- and u-probabilities

Usage

problink_em(patterns, mprobs0 = list(0.95), uprobs0 = list(0.02),
  p0 = 0.05, tol = 1e-05)

Arguments

patterns

either a table of patterns (as output by tabulate_patterns) or pairs with comparison columns (as output by compare_pairs).

mprobs0, uprobs0

initial values of the m- and u-probabilities. These should be lists with numeric values. The names of the elements in the list should correspond to the names in by_x in compare_pairs.

p0

the initial estimate of the probability that a pair is a match.

tol

when the change in the m and u-probabilities is smaller than tol the algorithm is stopped.

Value

Returns an object of type problink_em. This is a list containing the estimated mprobs, uprobs and overall linkage probability p. It also contains the table of comparison patterns.

References

Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. doi:10.2307/2286061.

Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.

Examples

data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(pairs)
summary(model)




djvanderlaan/reclin documentation built on Oct. 4, 2022, 7:03 p.m.