genSamples | R Documentation |
Generates training data by unsupervised classification.
genSamples(dataset, num.non, des.mprop = 0.1)
dataset |
Object of class |
num.non |
Positive Integer. Number of desired non-links in the training set. |
des.mprop |
Real number in the range [0,1]. Ratio of number of links to number of non-links in the training set. |
The application of supervised classifiers (via classifySupv
)
requires a training set of record pairs with known matching status.
Where no such data are available, genSamples
can be used to generate
training data. The matching status is determined by unsupervised
clustering with bclust
. Subsequently, the desired number of
links and non-links are sampled.
If the requested numbers of matches or non-matches is not feasible, a warning is issued and the maximum possible number is considered.
A list of "RecLinkResult"
objects.
train |
The sampled training data. |
valid |
All other record pairs |
Record pairs are split into the respective pairs
components.
The prediction
components represent the clustering result. If weights are
present in dataset
, the corresponding values of Wdata
are
stored to train
and valid
. All other components are copied
from dataset
.
Unsupervised clustering may lead to a poor quality of classification, all subsequent results should be evaluated critically.
Andreas Borg, Murat Sariyar
splitData
for splitting data sets without clustering.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.