emlinkMARmov: emlinkMARmov
In fastLink: Fast Probabilistic Record Linkage with Missing Data

emlinkMARmov

R Documentation

emlinkMARmov

Description

Expectation-Maximization algorithm for Record Linkage under the Missing at Random (MAR) assumption.

Usage

emlinkMARmov(patterns, nobs.a, nobs.b, p.m, iter.max,
tol, p.gamma.k.m, p.gamma.k.u, prior.lambda, w.lambda,
prior.pi, w.pi, address.field, gender.field, varnames)

Arguments

`patterns`	table that holds the counts for each unique agreement pattern. This object is produced by the function: tableCounts.
`nobs.a`	Number of observations in dataset A
`nobs.b`	Number of observations in dataset B
`p.m`	probability of finding a match. Default is 0.1
`iter.max`	Max number of iterations. Default is 5000
`tol`	Convergence tolerance. Default is 1e-05
`p.gamma.k.m`	probability that conditional of being in the matched set we observed a specific agreement value for field k.
`p.gamma.k.u`	probability that conditional of being in the non-matched set we observed a specific agreement value for field k.
`prior.lambda`	The prior probability of finding a match, derived from auxiliary data.
`w.lambda`	How much weight to give the prior on lambda versus the data. Must range between 0 (no weight on prior) and 1 (weight fully on prior)
`prior.pi`	The prior probability of the address field not matching, conditional on being in the matched set. To be used when the share of movers in the population is known with some certainty.
`w.pi`	How much weight to give the prior on pi versus the data. Must range between 0 (no weight on prior) and 1 (weight fully on prior)
`address.field`	Boolean indicators for whether a given field is an address field. Default is NULL (FALSE for all fields). Address fields should be set to TRUE while non-address fields are set to FALSE if provided.
`gender.field`	Boolean indicators for whether a given field is for gender. If so, exact match is conducted on gender. Default is NULL (FALSE for all fields). The one gender field should be set to TRUE while all other fields are set to FALSE if provided.
`varnames`	The vector of variable names used for matching. Automatically provided if using `fastLink()` wrapper. Used for clean visualization of EM results in summary functions.

Value

emlinkMARmov returns a list with the following components:

`zeta.j`	The posterior match probabilities for each unique pattern.
`p.m`	The probability of a pair matching.
`p.u`	The probability of a pair not matching.
`p.gamma.k.m`	The matching probability for a specific matching field.
`p.gamma.k.u`	The non-matching probability for a specific matching field.
`p.gamma.j.m`	The probability that a pair is in the matched set given a particular agreement pattern.
`p.gamma.j.u`	The probability that a pair is in the unmatched set given a particular agreement pattern.
`patterns.w`	Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.
`iter.converge`	The number of iterations it took the EM algorithm to converge.
`nobs.a`	The number of observations in dataset A.
`nobs.b`	The number of observations in dataset B.

Author(s)

Ted Enamorado <ted.enamorado@gmail.com> and Kosuke Imai

Examples

## Not run: 
## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))

## Run EM
em <- emlinkMARmov(tc, nobs.a = nrow(dfA), nobs.b = nrow(dfB))

## End(Not run)

fastLink documentation built on Nov. 17, 2023, 9:06 a.m.