SelectBlockingFunction | R Documentation |
Before calling ProbabilisticLinkage
or DeterministicLinkage
, a blocking method can be selected. For each blocking variable desired, the function call has to be repeated.
SelectBlockingFunction(variable1, variable2, method)
variable1 |
Column name of blocking variable 1. |
variable2 |
Column name of blocking variable 2. |
method |
Desired blocking method. Possible values are |
The following methods are available for blocking:
'exact'
Simple exact blocking. All records with the same values for the blocking variable create a block. Searching for links is only done within these blocks.
'exactCL'
The same as 'exact'
. Only works with strings; all caracters are capitalised.
Christen, P. (2012): Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.
DeterministicLinkage
,
ProbabilisticLinkage
,
SelectSimilarityFunction
,
StandardizeString
# load test data testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv") testData <- read.csv(testFile, head = FALSE, sep = "\t", colClasses = "character") # define year of birth (V3) as blocking variable bl <- SelectBlockingFunction("V3", "V3", method = "exact") # Select first name and last name as linking variables, # to be linked using the Jaro-Winkler similarity measure (first name) # and levenshtein distance (last name) l1 <- SelectSimilarityFunction("V7", "V7", method = "jw") l2 <- SelectSimilarityFunction("V8", "V8", method = "lv") # Link the data as specified in bl and l1/l2 # (in this small example data is linked to itself) res <- ProbabilisticLinkage(testData$V1, testData, testData$V1, testData, similarity = c(l1, l2), blocking = bl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.