SelectBlockingFunction: Select blocking method prior to linkage

View source: R/MTBOptions.R

SelectBlockingFunctionR Documentation

Select blocking method prior to linkage

Description

Before calling ProbabilisticLinkage or DeterministicLinkage, a blocking method can be selected. For each blocking variable desired, the function call has to be repeated.

Usage

SelectBlockingFunction(variable1, variable2, method)

Arguments

variable1

Column name of blocking variable 1.

variable2

Column name of blocking variable 2.

method

Desired blocking method. Possible values are 'exact' and 'exactCL'.

Details

The following methods are available for blocking:

'exact'

Simple exact blocking. All records with the same values for the blocking variable create a block. Searching for links is only done within these blocks.

'exactCL'

The same as 'exact'. Only works with strings; all caracters are capitalised.

References

Christen, P. (2012): Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.

See Also

DeterministicLinkage, ProbabilisticLinkage, SelectSimilarityFunction, StandardizeString

Examples

# load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
testData <- read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")

# define year of birth (V3) as blocking variable
bl <- SelectBlockingFunction("V3", "V3", method = "exact")

# Select first name and last name as linking variables,
# to be linked using the Jaro-Winkler similarity measure (first name)
# and levenshtein distance (last name)
l1 <- SelectSimilarityFunction("V7", "V7", method = "jw")
l2 <- SelectSimilarityFunction("V8", "V8", method = "lv")

# Link the data as specified in bl and l1/l2
# (in this small example data is linked to itself)
res <- ProbabilisticLinkage(testData$V1, testData,
  testData$V1, testData, similarity = c(l1, l2), blocking = bl)


PPRL documentation built on Nov. 10, 2022, 5:41 p.m.