Similaritybased filter for removing or repairing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 6 7  ## S3 method for class 'formula'
ModeFilter(formula, data, ...)
## Default S3 method:
ModeFilter(x, type = "classical", noiseAction = "repair",
epsilon = 0.05, maxIter = 100, alpha = 1, beta = 1,
classColumn = ncol(x), ...)

formula 
A formula describing the classification variable and the attributes to be used. 
data, x 
Data frame containing the tranining dataset to be filtered. 
... 
Optional parameters to be passed to other methods. 
type 
Character indicating the scheme to be used. It can be 'classical', 'iterative' or 'weighted'. 
noiseAction 
Character indicating what to do with noisy instances. It can be either 'remove' or 'repair'. 
epsilon 
If 'iterative' type is used, the loop will be stopped if the proportion of modified instances is less or equal than this threshold. 
maxIter 
Maximum number of iterations in 'iterative' type. 
alpha 
Parameter used in the computation of the similarity between two instances. 
beta 
It regulates the influence of the similarity metric in the estimation of a new label for an instance. 
classColumn 
positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. 
ModeFilter
estimates the most appropriate class for each instance based on the similarity metric
and the provided label. This can be addressed in three different ways (argument 'type'):
In the classical approach, all labels are tried for all instances, and the one maximizing a metric based on similarity is chosen. In the iterative approach, the same scheme is repeated until the proportion of modified instances is less than epsilon or the maximum number of iterations maxIter is reached. The weighted approach extends the classical one by assigning a weight for each instance, which quantifies the reliability on its label. This weights is utilized in the computation of the metric to be maximized.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
Du W., Urahama K. (2010, November): Errorcorrecting semisupervised pattern recognition with mode filter on graphs. In Aware Computing (ISAC), 2010 2nd International Symposium on (pp. 611). IEEE.
1 2 3 4 5 6 7 8 
Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.