Similarity-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 |
formula |
A formula describing the classification variable and the attributes to be used. |
data, x |
Data frame containing the tranining dataset to be filtered. |
... |
Optional parameters to be passed to other methods. |
k |
Total number of nearest neighbors to be used. |
classColumn |
Positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. |
AENN
applies the Edited Nearest Neighbor algorithm ENN
for all integers between 1 and k
on the whole dataset. At the end, any instance considered noisy by some ENN is removed.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
contains the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
Tomek I. (1976, June): An Experiment with the Edited Nearest-Neighbor Rule, in Systems, Man and Cybernetics, IEEE Transactions on, vol.SMC-6, no.6, pp. 448-452.
1 2 3 4 5 6 7 8 |
Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.