Description Usage Arguments Details Value References Examples
Ensemble-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 6 | ## S3 method for class 'formula'
edgeBoostFilter(formula, data, ...)
## Default S3 method:
edgeBoostFilter(x, m = 15, percent = 0.05,
threshold = 0, classColumn = ncol(x), ...)
|
formula |
A formula describing the classification variable and the attributes to be used. |
data, x |
Data frame containing the tranining dataset to be filtered. |
... |
Optional parameters to be passed to other methods. |
m |
Number of boosting iterations |
percent |
Real number between 0 and 1. It sets the percentage of instances to be removed (as long as
their edge value exceeds the parameter |
threshold |
Real number between 0 and 1. It sets the minimum edge value required by an instance in order to be removed. |
classColumn |
Positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. |
The full description of the method can be looked up in the provided reference.
An AdaBoost scheme (Freund & Schapire) is applied with a default C4.5 tree as weak classifier.
After m
iterations, those instances with larger (according to the constraints
percent
and threshold
) edge values (Wheway, Freund & Schapire) are considered noisy
and thus removed.
Notice that making use of extreme values (i.e. percent=1
or threshold=0
) any
'removing constraints' can be ignored.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
Freund Y., Schapire R. E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
Wheway V. (2001, January): Using boosting to detect noisy data. In Advances in Artificial Intelligence. PRICAI 2000 Workshop Reader (pp. 123-130). Springer Berlin Heidelberg.
1 2 3 4 5 6 7 8 |
Call:
edgeBoostFilter(formula = Species ~ ., data = iris, m = 10, percent = 0.05,
threshold = 0)
Parameters:
m: 10
percent: 0.05
threshold: 0
Results:
Number of removed instances: 8 (5.333333 %)
Number of repaired instances: 0 (0 %)
[1] TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.