Description Usage Arguments Details Value Note References Examples
Ensemble-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 6  | 
formula | 
 A formula describing the classification variable and the attributes to be used.  | 
data, x | 
 Data frame containing the tranining dataset to be filtered.  | 
... | 
 Optional parameters to be passed to other methods.  | 
consensus | 
 Logical. If FALSE, majority voting scheme is used for 'preliminary filtering' and 'noise free filtering' (see 'Details' and References' section). If TRUE, consensus voting scheme is applied.  | 
p | 
 Real number between 0 and 1. It sets the minimum proportion of original instances which must be tagged as noisy in order to go for another iteration.  | 
s | 
 Positive integer setting the stop criterion together with   | 
k | 
 Parameter for the k-nearest neighbors algorithm used for the 'noise score' stage (see 'Details' and 'References').  | 
threshold | 
 Real number between -1 and 1. It sets the noise score value above which an instance is removed.  | 
classColumn | 
 Positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered.  | 
The full description of the method can be looked up in the provided reference.
A 'preliminary filtering' is carried out with a fusion of classifiers (FC), including C4.5, 3NN, and logistic regression. Then,
potentially noisy instances are identified in a 'noise free filtering' process building the FC on the (preliminary) filtered
instances. Finally, a 'noise score' is computed on these potentially noisy instances, removing those exceeding the threshold value.
The process stops after s iterations with not enough (according to the proportion p) noisy
instances removed.
An object of class filter, which is a list with seven components:
cleanData is a data frame containing the filtered dataset.
remIdx is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab is a factor containing the new labels for repaired instances.
parameters is a list containing the argument values.
call contains the original call to the filter.
extraInf is a character that includes additional interesting
information not covered by previous items.
By means of a message, the number of noisy instances removed in each iteration is displayed in the console.
S\'aez J. A., Galar M., Luengo J., Herrera F. (2016): INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion, 27, 19-32.
1 2 3 4 5 6 7 8  | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.