ModeFilter: Mode Filter
In NoiseFiltersR: Label Noise Filters for Data Preprocessing in Classification

Description Usage Arguments Details Value References Examples

Similarity-based filter for removing or repairing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.

## S3 method for class 'formula'
ModeFilter(formula, data, ...)

## Default S3 method:
ModeFilter(x, type = "classical", noiseAction = "repair",
  epsilon = 0.05, maxIter = 100, alpha = 1, beta = 1,
  classColumn = ncol(x), ...)

`formula`	A formula describing the classification variable and the attributes to be used.
`data, x`	Data frame containing the tranining dataset to be filtered.
`...`	Optional parameters to be passed to other methods.
`type`	Character indicating the scheme to be used. It can be 'classical', 'iterative' or 'weighted'.
`noiseAction`	Character indicating what to do with noisy instances. It can be either 'remove' or 'repair'.
`epsilon`	If 'iterative' type is used, the loop will be stopped if the proportion of modified instances is less or equal than this threshold.
`maxIter`	Maximum number of iterations in 'iterative' type.
`alpha`	Parameter used in the computation of the similarity between two instances.
`beta`	It regulates the influence of the similarity metric in the estimation of a new label for an instance.
`classColumn`	positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered.

ModeFilter estimates the most appropriate class for each instance based on the similarity metric and the provided label. This can be addressed in three different ways (argument 'type'):

In the classical approach, all labels are tried for all instances, and the one maximizing a metric based on similarity is chosen. In the iterative approach, the same scheme is repeated until the proportion of modified instances is less than epsilon or the maximum number of iterations maxIter is reached. The weighted approach extends the classical one by assigning a weight for each instance, which quantifies the reliability on its label. This weights is utilized in the computation of the metric to be maximized.

An object of class filter, which is a list with seven components:

cleanData is a data frame containing the filtered dataset.
remIdx is a vector of integers indicating the indexes for removed instances (i.e. their row number with respect to the original data frame).
repIdx is a vector of integers indicating the indexes for repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab is a factor containing the new labels for repaired instances.
parameters is a list containing the argument values.
call contains the original call to the filter.
extraInf is a character that includes additional interesting information not covered by previous items.

Du W., Urahama K. (2010, November): Error-correcting semi-supervised pattern recognition with mode filter on graphs. In Aware Computing (ISAC), 2010 2nd International Symposium on (pp. 6-11). IEEE.

# Next example is not run because in some cases it can be rather slow
## Not run: 
   data(iris)
   out <- ModeFilter(Species~., data = iris, type = "classical", noiseAction = "remove")
   print(out)
   identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])

## End(Not run)

Call:
ModeFilter(formula = Species ~ ., data = iris, type = "classical", 
    noiseAction = "remove")

Parameters:
type: classical
noiseAction: remove
maxIter: 100
alpha: 1
beta: 1

Results:
Number of removed instances: 6 (4 %)
Number of repaired instances: 0 (0 %)
[1] TRUE