DeterministicLinkage | R Documentation |
Deterministic Record Linkage of two data sets giving results enabling rule-based methods.
DeterministicLinkage(IDA, dataA, IDB, dataB, blocking = NULL, similarity)
IDA |
A character vector or integer vector containing the IDs of the first data.frame. |
dataA |
A data.frame containing the data to be linked and all linking variables as specified in |
IDB |
A character vector or integer vector containing the IDs of the second data.frame. |
dataB |
A data.frame containing the data to be linked and all linking variables as specified in |
blocking |
Optional blocking variables. See |
similarity |
Variables used for linking and their respective linkage methods as specified in |
To call the Deterministic Linkage function it is necessary to set up linking variables and methods. Using blocking variables is optional. Further options are available in SelectBlockingFunction
and SelectSimilarityFunction
.
A data.frame containing ID-pairs and the link status for each linking variable. This way, rules can be put into place allowing the user to classify links and non-links.
Christen, P. (2012): Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.
Schnell, R., Bachteler, T., Reiher, J. (2004): A toolbox for record linkage. Austrian Journal of Statistics 33(1-2): 125-133.
ProbabilisticLinkage
,
SelectBlockingFunction
,
SelectSimilarityFunction
,
StandardizeString
# load test data testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv") testData <- read.csv(testFile, head = FALSE, sep = "\t", colClasses = "character") # define year of birth (V3) as blocking variable bl <- SelectBlockingFunction("V3", "V3", method = "exact") # Select first name and last name as linking variables, # to be linked using the soundex phonetic (first name) # and exact matching (last name) l1 <- SelectSimilarityFunction("V7", "V7", method = "Soundex") l2 <- SelectSimilarityFunction("V8", "V8", method = "exact") # Link the data as specified in bl and l1/l2 # (in this small example data is linked to itself) res <- DeterministicLinkage(testData$V1, testData, testData$V1, testData, similarity = c(l1, l2), blocking = bl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.