emlinkRS: emlinkRS

View source: R/emlinkMARmov.R

emlinkRSR Documentation

emlinkRS

Description

Calculates Felligi-Sunter weights and posterior zeta probabilities for matching patterns observed in a larger population that are not present in a sub-sample used to estimate the EM.

Usage

emlinkRS(patterns.out, em.out, nobs.a, nobs.b)

Arguments

patterns.out

The output from 'tableCounts()' or 'emlinkMARmov()' (run on full dataset), containing all observed matching patterns in the full sample and the number of times that pattern is observed.

em.out

The output from 'emlinkMARmov()', an EM object estimated on a smaller random sample to apply to counts from a larger sample

nobs.a

Total number of observations in dataset A

nobs.b

Total number of observations in dataset B

Value

emlinkMARmov returns a list with the following components:

zeta.j

The posterior match probabilities for each unique pattern.

p.m

The posterior probability of a pair matching.

p.u

The posterior probability of a pair not matching.

p.gamma.k.m

The posterior of the matching probability for a specific matching field.

p.gamma.k.u

The posterior of the non-matching probability for a specific matching field.

p.gamma.j.m

The posterior probability that a pair is in the matched set given a particular agreement pattern.

p.gamma.j.u

The posterior probability that a pair is in the unmatched set given a particular agreement pattern.

patterns.w

Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.

iter.converge

The number of iterations it took the EM algorithm to converge.

nobs.a

The number of observations in dataset A.

nobs.b

The number of observations in dataset B.

Author(s)

Ted Enamorado <ted.enamorado@gmail.com> and Ben Fifield <benfifield@gmail.com>

Examples

## Not run: 
## -------------
## Run on subset
## -------------
dfA.s <- dfA[sample(1:nrow(dfA), 50),]; dfB.s <- dfB[sample(1:nrow(dfB), 50),]

## Calculate gammas
g1 <- gammaCKpar(dfA.s$firstname, dfB.s$firstname)
g2 <- gammaCKpar(dfA.s$middlename, dfB.s$middlename)
g3 <- gammaCKpar(dfA.s$lastname, dfB.s$lastname)
g4 <- gammaKpar(dfA.s$birthyear, dfB.s$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s))

## Run EM
em <- emlinkMAR(tc, nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s))

## ------------------
## Apply to full data
## ------------------

## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))

em.full <- emlinkRS(tc, em, nrow(dfA), nrow(dfB)

## End(Not run)


kosukeimai/fastLink documentation built on Nov. 17, 2023, 8:11 p.m.