emlinklog: emlinklog

View source: R/emlinklog.R

emlinklogR Documentation

emlinklog

Description

Expectation-Maximization algorithm for Record Linkage allowing for dependencies across linkage fields

Usage

emlinklog(patterns, nobs.a, nobs.b, p.m, p.gamma.j.m, p.gamma.j.u,
iter.max, tol, varnames)

Arguments

patterns

table that holds the counts for each unique agreement pattern. This object is produced by the function: tableCounts.

nobs.a

Number of observations in dataset A

nobs.b

Number of observations in dataset B

p.m

probability of finding a match. Default is 0.1

p.gamma.j.m

probability that conditional of being in the matched set we observed a specific agreement pattern.

p.gamma.j.u

probability that conditional of being in the non-matched set we observed a specific agreement pattern.

iter.max

Max number of iterations. Default is 5000

tol

Convergence tolerance. Default is 1e-05

varnames

The vector of variable names used for matching. Automatically provided if using fastLink() wrapper. Used for clean visualization of EM results in summary functions.

Value

emlinklog returns a list with the following components:

zeta.j

The posterior match probabilities for each unique pattern.

p.m

The probability of finding a match.

p.u

The probability of finding a non-match.

p.gamma.j.m

The probability of observing a particular agreement pattern conditional on being in the set of matches.

p.gamma.j.u

The probability of observing a particular agreement pattern conditional on being in the set of non-matches.

patterns.w

Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.

iter.converge

The number of iterations it took the EM algorithm to converge.

nobs.a

The number of observations in dataset A.

nobs.b

The number of observations in dataset B.

Author(s)

Ted Enamorado <ted.enamorado@gmail.com> and Benjamin Fifield

Examples

## Not run: 
## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))

## Run EM
em.log <- emlinklog(tc, nobs.a = nrow(dfA), nobs.b = nrow(dfB))

## End(Not run)


fastLink documentation built on Nov. 17, 2023, 9:06 a.m.