genSamples: Generate Training Set
In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

genSamples

R Documentation

Generate Training Set

Description

Generates training data by unsupervised classification.

Usage

genSamples(dataset, num.non, des.mprop = 0.1)

Arguments

`dataset`	Object of class `RecLinkData`. Data pairs from which to sample.
`num.non`	Positive Integer. Number of desired non-links in the training set.
`des.mprop`	Real number in the range [0,1]. Ratio of number of links to number of non-links in the training set.

Details

The application of supervised classifiers (via classifySupv) requires a training set of record pairs with known matching status. Where no such data are available, genSamples can be used to generate training data. The matching status is determined by unsupervised clustering with bclust. Subsequently, the desired number of links and non-links are sampled.

If the requested numbers of matches or non-matches is not feasible, a warning is issued and the maximum possible number is considered.

Value

A list of "RecLinkResult" objects.

`train`	The sampled training data.
`valid`	All other record pairs

Record pairs are split into the respective pairs components. The prediction components represent the clustering result. If weights are present in dataset, the corresponding values of Wdata are stored to train and valid. All other components are copied from dataset.

Note

Unsupervised clustering may lead to a poor quality of classification, all subsequent results should be evaluated critically.

Author(s)

Andreas Borg, Murat Sariyar

RecordLinkage
Record Linkage Functions for Linking and Deduplicating Data Sets

genSamples: Generate Training Set
In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

Generate Training Set

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Related to genSamples in RecordLinkage...

R Package Documentation

Browse R Packages

We want your feedback!

RecordLinkage Record Linkage Functions for Linking and Deduplicating Data Sets

genSamples: Generate Training Set In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

Generate Training Set

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Related to genSamples in RecordLinkage...

R Package Documentation

Browse R Packages

We want your feedback!

RecordLinkage
Record Linkage Functions for Linking and Deduplicating Data Sets

genSamples: Generate Training Set
In RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets