ENN: Edited Nearest Neighbors
In NoiseFiltersR: Label Noise Filters for Data Preprocessing in Classification

Description Usage Arguments Details Value References Examples

Similarity-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.

## S3 method for class 'formula'
ENN(formula, data, ...)

## Default S3 method:
ENN(x, k = 3, classColumn = ncol(x), ...)

`formula`	A formula describing the classification variable and the attributes to be used.
`data, x`	Data frame containing the tranining dataset to be filtered.
`...`	Optional parameters to be passed to other methods.
`k`	Number of nearest neighbors to be used.
`classColumn`	positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered.

ENN finds the k nearest neighbors for each instance, which is removed if the majority class in this neighborhood is different from its class.

An object of class filter, which is a list with seven components:

cleanData is a data frame containing the filtered dataset.
remIdx is a vector of integers indicating the indexes for removed instances (i.e. their row number with respect to the original data frame).
repIdx is a vector of integers indicating the indexes for repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab is a factor containing the new labels for repaired instances.
parameters is a list containing the argument values.
call contains the original call to the filter.
extraInf is a character that includes additional interesting information not covered by previous items.

Wilson D. L. (1972): Asymptotic properties of nearest neighbor rules using edited data. Systems, Man and Cybernetics, IEEE Transactions on, (3), 408-421.

data(iris)
out <- ENN(Species~., data = iris, k = 5)
summary(out)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])

Filter ENN applied to dataset iris 

Call:
ENN(formula = Species ~ ., data = iris, k = 5)

Parameters:
k: 5

Results:
Number of removed instances: 8 (5.333333 %)
Number of repaired instances: 0 (0 %)
[1] TRUE