Description Usage Arguments Details Value References Examples
Ensemble-based filter for removing label noise from a dataset as a preprocessing step of classification. For more information, see 'Details' and 'References' sections.
1 2 3 4 5 6 |
formula |
A formula describing the classification variable and the attributes to be used. |
data, x |
Data frame containing the tranining dataset to be filtered. |
... |
Optional parameters to be passed to other methods. |
nfolds |
Number of folds for the cross voting scheme. |
consensus |
If set to |
m |
Number of classifiers to make up the ensemble. It must range between 1 and 9. |
classColumn |
Positive integer indicating the column which contains the (factor of) classes. By default, the last column is considered. |
dynamicCF
(Garcia et al., 2012) follows the same approach as EF
, but the ensemble of classifiers
is not fixed beforehand. Namely, dynamicCF
trains 9 well-known classifiers in the
dataset to be filtered, and selects for the ensemble those with the m
best predictions.
Then, a nfolds
-folds cross voting scheme is applied, with consensus or majority strategies
depending on parameter consensus
.
The nine (standard) classifiers handled by dynamicCF
are SVM, 3-KNN, 5-KNN, 9-KNN, CART, C4.5,
Random Forest, Naive Bayes and Multilayer Perceptron Neural Network.
An object of class filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
Garcia L. P. F., Lorena A. C., Carvalho A. C. (2012, October): A study on class noise detection and elimination. In Brazilian Symposium on Neural Networks (SBRN), pp. 13-18, IEEE.
1 2 3 4 5 6 7 8 9 10 11 | # Next example is not run in order to save time
## Not run:
data(iris)
trainData <- iris[c(1:20,51:70,101:120),]
# We fix a seed since there exists a random partition for the ensemble
set.seed(1)
out <- dynamicCF(Species~Petal.Length + Sepal.Length, data = trainData, nfolds = 5, m = 3)
summary(out, explicit = TRUE)
identical(out$cleanData, trainData[setdiff(1:nrow(trainData),out$remIdx),])
## End(Not run)
|
Filter dynamicCF applied to dataset trainData
Call:
dynamicCF(formula = Species ~ Petal.Length + Sepal.Length, data = trainData,
nfolds = 5, m = 3)
Parameters:
nfolds: 5
consensus: FALSE
m: 3
Results:
Number of removed instances: 1 (1.666667 %)
Number of repaired instances: 0 (0 %)
Additional information:
3 selected classifiers: RandomForest MultilayerPerceptron SVM
Explicit indexes for removed instances:
47
[1] TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.