Example Session for Supervised Classification

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
options(width = 60)
backup_options <- options()

This document shows an example session for using supervised classification in the package RecordLinkage for deduplication of a single data set. Conducting linkage of two data sets differs only in the step of generating record pairs. See also the vignette on Fellegi-Sunter deduplication for some general information on using the package.

Generating comparison patterns

library(RecordLinkage)

In this session, a training set with 50 matches and 250 non-matches is generated from the included data set RLData10000. Record pairs from the set RLData500 are used to calibrate and subsequently evaluate the classifiers.

data(RLdata500)
data(RLdata10000)
train_pairs <- compare.dedup(RLdata10000, identity = identity.RLdata10000,
                             n_match = 500, n_non_match = 500)
eval_pairs <- compare.dedup(RLdata500, identity = identity.RLdata500)

Training

trainSupv handles calibration of supervised classificators which are selected through the argument method. In the following, a single decision tree (rpart), a bootstrap aggregation of decision trees (bagging) and a support vector machine are calibrated (svm).

model_rpart <- trainSupv(train_pairs, method = "rpart")
model_bagging <- trainSupv(train_pairs, method = "bagging")
model_svm <- trainSupv(train_pairs, method = "svm")

Classification

classifySupv handles classification for all supervised classificators, taking as arguments the structure returned by trainSupv which contains the classification model and the set of record pairs which to classify.

result_rpart <- classifySupv(model_rpart, eval_pairs)
result_bagging <- classifySupv(model_bagging, eval_pairs)
result_svm <- classifySupv(model_svm, eval_pairs)

Results

Rpart

summary(result_rpart)

Bagging

summary(result_bagging)

SVM

summary(result_svm)
options(backup_options)


Try the RecordLinkage package in your browser

Any scripts or data that you put into this service are public.

RecordLinkage documentation built on Jan. 25, 2026, 9:06 a.m.