Description Usage Arguments Details Value References Examples
Similarity-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 | ## S3 method for class 'formula'
TomekLinks(formula, data, ...)
## Default S3 method:
TomekLinks(x, classColumn = ncol(x), ...)
|
formula |
A formula describing the classification variable and the attributes to be used. |
data, x |
Data frame containing the tranining dataset to be filtered. |
... |
Optional parameters to be passed to other methods. |
classColumn |
positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. |
The function TomekLinks
removes "TomekLink points" from the dataset. These are introduced
in [Tomek, 1976], and are expected to lie on the border between classes.
Removing such points is a typical procedure for cleaning noise [Lorena, 2002].
Since the computation of mean points is necessary for TomekLinks, only numeric attributes are allowed. Moreover, only two different classes are allowed to detect TomekLinks.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
Tomek I. (Nov. 1976): Two modifications of CNN, IEEE Trans. Syst., Man, Cybern., vol. 6, no. 11, pp. 769-772.
Lorena A. C., Batista G. E. A. P. A., de Carvalho A. C. P. L. F., Monard M. C. (Nov. 2002): The influence of noisy patterns in the performance of learning methods in the splice junction recognition problem, in Proc. 7th Brazilian Symp. Neural Netw., Recife, Brazil, pp. 31-37.
1 2 3 4 5 6 7 | # Next code fails since TomekLinks method is designed for two-class problems.
# Some decomposition strategy like OVO or OVA could be used to overcome this.
## Not run:
data(iris)
out <- TomekLinks(Species~., data = iris)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.