Example Session for Supervised Classification
In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
options(width = 60)
backup_options <- options()

This document shows an example session for using supervised classification in the package RecordLinkage for deduplication of a single data set. Conducting linkage of two data sets differs only in the step of generating record pairs. See also the vignette on Fellegi-Sunter deduplication for some general information on using the package.

Generating comparison patterns

library(RecordLinkage)

In this session, a training set with 50 matches and 250 non-matches is generated from the included data set RLData10000. Record pairs from the set RLData500 are used to calibrate and subsequently evaluate the classifiers.

data(RLdata500)
data(RLdata10000)
train_pairs <- compare.dedup(RLdata10000, identity = identity.RLdata10000,
                             n_match = 500, n_non_match = 500)
eval_pairs <- compare.dedup(RLdata500, identity = identity.RLdata500)

Training

trainSupv handles calibration of supervised classificators which are selected through the argument method. In the following, a single decision tree (rpart), a bootstrap aggregation of decision trees (bagging) and a support vector machine are calibrated (svm).

model_rpart <- trainSupv(train_pairs, method = "rpart")
model_bagging <- trainSupv(train_pairs, method = "bagging")
model_svm <- trainSupv(train_pairs, method = "svm")

Classification

classifySupv handles classification for all supervised classificators, taking as arguments the structure returned by trainSupv which contains the classification model and the set of record pairs which to classify.

result_rpart <- classifySupv(model_rpart, eval_pairs)
result_bagging <- classifySupv(model_bagging, eval_pairs)
result_svm <- classifySupv(model_svm, eval_pairs)

Results

Rpart

summary(result_rpart)

Bagging

summary(result_bagging)

SVM

summary(result_svm)

options(backup_options)

Any scripts or data that you put into this service are public.

RecordLinkage documentation built on Jan. 25, 2026, 9:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RecordLinkage
Record Linkage Functions for Linking and Deduplicating Data Sets

Example Session for Supervised Classification
In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

Generating comparison patterns

Training

Classification

Results

Rpart

Bagging

SVM

Try the RecordLinkage package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

RecordLinkage Record Linkage Functions for Linking and Deduplicating Data Sets

Example Session for Supervised Classification In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

Generating comparison patterns

Training

Classification

Results

Rpart

Bagging

SVM

Try the RecordLinkage package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

RecordLinkage
Record Linkage Functions for Linking and Deduplicating Data Sets

Example Session for Supervised Classification
In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets