DeterministicLinkage: Deterministic Record Linkage

View source: R/Linkage.R

DeterministicLinkageR Documentation

Deterministic Record Linkage

Description

Deterministic Record Linkage of two data sets giving results enabling rule-based methods.

Usage

DeterministicLinkage(IDA, dataA, IDB, dataB, blocking = NULL, similarity)

Arguments

IDA

A character vector or integer vector containing the IDs of the first data.frame.

dataA

A data.frame containing the data to be linked and all linking variables as specified in SelectBlockingFunction and SelectSimilarityFunction.

IDB

A character vector or integer vector containing the IDs of the second data.frame.

dataB

A data.frame containing the data to be linked and all linking variables as specified in SelectBlockingFunction and SelectSimilarityFunction.

blocking

Optional blocking variables. See SelectBlockingFunction.

similarity

Variables used for linking and their respective linkage methods as specified in SelectSimilarityFunction.

Details

To call the Deterministic Linkage function it is necessary to set up linking variables and methods. Using blocking variables is optional. Further options are available in SelectBlockingFunction and SelectSimilarityFunction.

Value

A data.frame containing ID-pairs and the link status for each linking variable. This way, rules can be put into place allowing the user to classify links and non-links.

Source

Christen, P. (2012): Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.

Schnell, R., Bachteler, T., Reiher, J. (2004): A toolbox for record linkage. Austrian Journal of Statistics 33(1-2): 125-133.

See Also

ProbabilisticLinkage, SelectBlockingFunction, SelectSimilarityFunction, StandardizeString

Examples

# load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
testData <- read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")

# define year of birth (V3) as blocking variable
bl <- SelectBlockingFunction("V3", "V3", method = "exact")

# Select first name and last name as linking variables,
# to be linked using the soundex phonetic (first name)
# and exact matching (last name)
l1 <- SelectSimilarityFunction("V7", "V7", method = "Soundex")
l2 <- SelectSimilarityFunction("V8", "V8", method = "exact")

# Link the data as specified in bl and l1/l2
# (in this small example data is linked to itself)
res <- DeterministicLinkage(testData$V1, testData,
  testData$V1, testData, similarity = c(l1, l2), blocking = bl)


PPRL documentation built on Nov. 10, 2022, 5:41 p.m.