ENN: Edited Nearest Neighbors

Description Usage Arguments Details Value References Examples

Description

Similarity-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.

Usage

1
2
3
4
5
## S3 method for class 'formula'
ENN(formula, data, ...)

## Default S3 method:
ENN(x, k = 3, classColumn = ncol(x), ...)

Arguments

formula

A formula describing the classification variable and the attributes to be used.

data, x

Data frame containing the tranining dataset to be filtered.

...

Optional parameters to be passed to other methods.

k

Number of nearest neighbors to be used.

classColumn

positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered.

Details

ENN finds the k nearest neighbors for each instance, which is removed if the majority class in this neighborhood is different from its class.

Value

An object of class filter, which is a list with seven components:

References

Wilson D. L. (1972): Asymptotic properties of nearest neighbor rules using edited data. Systems, Man and Cybernetics, IEEE Transactions on, (3), 408-421.

Examples

1
2
3
4
data(iris)
out <- ENN(Species~., data = iris, k = 5)
summary(out)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])

Example output

Filter ENN applied to dataset iris 

Call:
ENN(formula = Species ~ ., data = iris, k = 5)

Parameters:
k: 5

Results:
Number of removed instances: 8 (5.333333 %)
Number of repaired instances: 0 (0 %)
[1] TRUE

NoiseFiltersR documentation built on May 2, 2019, 2:03 a.m.