nameReweight: nameReweight
In fastLink: Fast Probabilistic Record Linkage with Missing Data

nameReweight

R Documentation

nameReweight

Description

Reweights posterior probabilities to account for observed frequency of names. Downweights posterior probability of match if first name is common, upweights if first name is uncommon.

Usage

nameReweight(dfA, dfB, EM, gammalist, matchesLink,
varnames, firstname.field, patterns, threshold.match, n.cores)

Arguments

`dfA`	The full version of dataset A that is being matched.
`dfB`	The full version of dataset B that is being matched.
`EM`	The EM object from `emlinkMARmov()`
`gammalist`	The list of gamma objects calculated on the full dataset that indicate matching patterns, which is fed into `tableCounts()` and `matchesLink()`.
`matchesLink`	The output from `matchesLink()`.
`varnames`	A vector of variable names to use for matching. Must be present in both matchesA and matchesB.
`firstname.field`	A vector of booleans, indicating whether each field indicates first name. TRUE if so, otherwise FALSE.
`patterns`	The output from `getPatterns()`.
`threshold.match`	A number between 0 and 1 indicating either the lower bound (if only one number provided) or the range of certainty that the user wants to declare a match. For instance, threshold.match = .85 will return all pairs with posterior probability greater than .85 as matches, while threshold.match = c(.85, .95) will return all pairs with posterior probability between .85 and .95 as matches.
`n.cores`	Number of cores to parallelize over. Default is NULL.

Value

nameReweight() returns a list containing the following elements:

`zetaA`	The reweighted zeta estimates for each matched element in dataset A.
`zetaB`	The reweighted zeta estimates for each matched element in dataset B.