emlinklog | R Documentation |
Expectation-Maximization algorithm for Record Linkage allowing for dependencies across linkage fields
emlinklog(patterns, nobs.a, nobs.b, p.m, p.gamma.j.m, p.gamma.j.u,
iter.max, tol, varnames)
patterns |
table that holds the counts for each unique agreement pattern. This object is produced by the function: tableCounts. |
nobs.a |
Number of observations in dataset A |
nobs.b |
Number of observations in dataset B |
p.m |
probability of finding a match. Default is 0.1 |
p.gamma.j.m |
probability that conditional of being in the matched set we observed a specific agreement pattern. |
p.gamma.j.u |
probability that conditional of being in the non-matched set we observed a specific agreement pattern. |
iter.max |
Max number of iterations. Default is 5000 |
tol |
Convergence tolerance. Default is 1e-05 |
varnames |
The vector of variable names used for matching. Automatically provided if using |
emlinklog
returns a list with the following components:
zeta.j |
The posterior match probabilities for each unique pattern. |
p.m |
The probability of finding a match. |
p.u |
The probability of finding a non-match. |
p.gamma.j.m |
The probability of observing a particular agreement pattern conditional on being in the set of matches. |
p.gamma.j.u |
The probability of observing a particular agreement pattern conditional on being in the set of non-matches. |
patterns.w |
Counts of the agreement patterns observed, along with the Felligi-Sunter Weights. |
iter.converge |
The number of iterations it took the EM algorithm to converge. |
nobs.a |
The number of observations in dataset A. |
nobs.b |
The number of observations in dataset B. |
Ted Enamorado <ted.enamorado@gmail.com> and Benjamin Fifield
## Not run:
## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)
## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))
## Run EM
em.log <- emlinklog(tc, nobs.a = nrow(dfA), nobs.b = nrow(dfB))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.