Description Usage Arguments Details Value Note References Examples
Ensemble-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 6 | ## S3 method for class 'formula'
ORBoostFilter(formula, data, ...)
## Default S3 method:
ORBoostFilter(x, N = 20, d = 11, Naux = max(20, N),
useDecisionStump = FALSE, classColumn = ncol(x), ...)
|
formula |
A formula describing the classification variable and the attributes to be used. |
data, x |
Data frame containing the tranining dataset to be filtered. |
... |
Optional parameters to be passed to other methods. |
N |
Number of boosting iterations. |
d |
Threshold for removing noisy instances. Authors recommend to set it between 3 and 20. If it is set to |
Naux |
Number of boosting iterations for AdaBoost when computing the optimal threshold 'd'. |
useDecisionStump |
If |
classColumn |
Positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. |
The full description of ORBoostFilter
method can be looked up in Karmaker & Kwek.
In general terms, a weak classifier is built in each iteration, and misclassified instances have their weight
increased for the next round. Instances are removed when their weight exceeds the
threshold d
, i.e. they have been misclassified in consecutive rounds.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
By means of a message, the number of noisy instances removed in each iteration is displayed in the console.
Karmaker A., Kwek S. (2005, November): A boosting approach to remove class label noise. In Hybrid Intelligent Systems, 2005. HIS'05. Fifth International Conference on (pp. 6-pp). IEEE.
Freund Y., Schapire R. E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
1 2 3 4 5 6 7 8 |
Iteration 1: 0 noisy instances removed.
Iteration 2: 6 noisy instances removed.
Iteration 3: 0 noisy instances removed.
Iteration 4: 0 noisy instances removed.
Iteration 5: 0 noisy instances removed.
Iteration 6: 0 noisy instances removed.
Iteration 7: 0 noisy instances removed.
Iteration 8: 0 noisy instances removed.
Iteration 9: 0 noisy instances removed.
Iteration 10: 0 noisy instances removed.
Filter ORBoostFilter applied to dataset iris
Call:
ORBoostFilter(formula = Species ~ ., data = iris, N = 10)
Parameters:
N: 10
d: 11
Naux: 20
useDecisionStump: FALSE
Results:
Number of removed instances: 6 (4 %)
Number of repaired instances: 0 (0 %)
[1] TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.