epiClassify: Classify record pairs with EpiLink weights

epiClassifyR Documentation

Classify record pairs with EpiLink weights

Description

Classifies record pairs as link, non-link or possible link based on weights computed by epiWeights and the thresholds passed as arguments.

Usage

epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper,
  ...)

## S4 method for signature 'RecLinkData'
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper)

## S4 method for signature 'RLBigData'
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper,
  e = 0.01, f = getFrequencies(rpairs), withProgressBar = (sink.number()==0))

Arguments

rpairs

RecLinkData object. Record pairs to be classified.

threshold.upper

A numeric value between 0 and 1.

threshold.lower

A numeric value between 0 and 1 lower than threshold.upper

e

Numeric vector. Estimated error rate(s).

f

Numeric vector. Average frequency of attribute values.

withProgressBar

Logical. Whether to display a progress bar.

...

Placeholder for optional arguments

Details

All record pairs with weights greater or equal threshold.upper are classified as links. Record pairs with weights smaller than threshold.upper and greater or equal threshold.lower are classified as possible links. All remaining records are classified as non-links.

For the "RecLinkData" method, weights must have been calculated for rpairs using epiWeights.

A progress bar is displayed by the "RLBigData" method only if weights are calculated on the fly and, by default, unless output is diverted by sink (e.g. in a Sweave script).

Value

For the "RecLinkData" method, a S3 object of class "RecLinkResult" that represents a copy of newdata with element rpairs$prediction, which stores the classification result, as addendum.

For the "RLBigData" method, a S4 object of class "RLResult".

Author(s)

Andreas Borg, Murat Sariyar

See Also

epiWeights

Examples

# generate record pairs
data(RLdata500)
p=compare.dedup(RLdata500,strcmp=TRUE ,strcmpfun=levenshteinSim,
  identity=identity.RLdata500, blockfld=list("by", "bm", "bd"))

# calculate weights
p=epiWeights(p)

# classify and show results
summary(epiClassify(p,0.6))

RecordLinkage documentation built on Nov. 10, 2022, 5:42 p.m.