Description Usage Arguments Details Value References Examples
Ensemble-based filter for removing or repairing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 6 | ## S3 method for class 'formula'
hybridRepairFilter(formula, data, ...)
## Default S3 method:
hybridRepairFilter(x, consensus = FALSE,
noiseAction = "remove", classColumn = ncol(x), ...)
|
formula |
A formula describing the classification variable and the attributes to be used. |
data, x |
Data frame containing the tranining dataset to be processed. |
... |
Optional parameters to be passed to other methods. |
consensus |
If set to |
noiseAction |
Character which can be set to "remove", "repair" or "hybrid". The filter accordingly decides what to do with the identified noise (see Details). |
classColumn |
Positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. |
As presented in (Miranda et al., 2009), hybridRepairFilter
builds on the dataset an ensemble of four
classifiers: SVM, Neural Network, CART, KNN (combining k=1,3,5). According to their predictions and
majority or consensus voting schemes, a
subset of instances are labeled as noise. These are removed if noiseAction
equals "remove", their class
is changed into the most voted among the ensemble if noiseAction
equals "repair", and when the latter
is set to "hybrid", the vote of KNN decides whether remove or repair.
All this procedure is repeated while the accuracy (over the original dataset) of the ensemble trained with the processed dataset increases.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
Miranda A. L., Garcia L. P. F., Carvalho A. C., Lorena A. C. (2009): Use of classification algorithms in noise detection and elimination. In Hybrid Artificial Intelligence Systems (pp. 417-424). Springer Berlin Heidelberg.
1 2 3 4 5 6 7 |
Filter hybridRepairFilter applied to dataset
Call:
hybridRepairFilter(x = iris, noiseAction = "hybrid")
Parameters:
consensus: FALSE
noiseAction: hybrid
Results:
Number of removed instances: 0 (0 %)
Number of repaired instances: 5 (3.333333 %)
Additional information:
The number of iterations was 2
Explicit indexes for removed instances:
Explicit indexes for repaired instances:
71 107 120 134 135
New labels for repaired instances:
virginica versicolor versicolor versicolor versicolor
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.