Description Usage Arguments Details Value Author(s) Examples
View source: R/Classif_2_Classes.R
This function implements a 'Stacking' ensemble learning strategy. Users can provide heterogeneous features (other than genomic features) which will be taken into account during classification model building. A 'two-classes' classification task is addressed.
1 2 3 4 5 6 7 8 9 | DaMiR.EnsembleLearning2cl(
data,
classes,
variables,
fSample.tr = 0.7,
fSample.tr.w = 0.7,
iter = 100,
cl_type = c("RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS")
)
|
data |
A transposed data frame of normalized expression data. Rows and Cols should be, respectively, observations and features |
classes |
A class vector with |
variables |
An optional data frame containing other variables (but without 'class' column). Each column represents a different covariate to be considered in the model |
fSample.tr |
Fraction of samples to be used as training set; default is 0.7 |
fSample.tr.w |
Fraction of samples of training set to be used during weight estimation; default is 0.7 |
iter |
Number of iterations to assess classification accuracy; default is 100 |
cl_type |
List of weak classifiers that will compose the meta-learners. "RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS" are allowed. Default is c("RF", "LR", "kNN", "LDA", "NB", "SVM") |
To assess the robustness of a set of predictors, a specific 'Stacking' strategy has been implemented. First, a training set (TR1) and a test set (TS1) are generated by 'bootstrap' sampling. Then, sampling again from TR1 subset, another pair of training (TR2) and test set (TS2) are obtained. TR2 is used to train Random Forest (RF), Naive Bayes (NB), Support Vector Machines (SVM), k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA) and Logistic Regression (LR) classifiers, whereas TS2 is used to test their accuracy and to calculate weights. The decision rule of 'Stacking' classifier is made by a linear combination of the product between weigths (w) and predictions (Pr) of each classifier; for each sample k, the prediction is computed by:
Pr_{k, Ensemble} = w_{RF} * Pr_{k, RF} + w_{NB} * Pr_{k, NB} + w_{SVM} * Pr_{k, SVM} + w_{k, kNN} * Pr_{k, kNN} + w_{k, LDA} * Pr_{k, LDA} + w_{k, LR} * Pr_{k, LR}
Performance of 'Stacking' classifier is evaluated by using TS1. This process is repeated several times (default 100 times).
A list containing:
A matrix of accuracies of each classifier in each iteration.
A matrix of weights used for each classifier in each iteration.
A list of all models generated in each iteration.
A violin plot of model accuracy obtained for each iteration.
Mattia Chiesa, Luca Piacentini
1 2 3 4 5 6 7 8 9 10 | # use example data:
data(selected_features)
data(df)
set.seed(1)
# only for the example:
# speed up the process setting a low 'iter' argument value;
# for real data set use default 'iter' value (i.e. 100) or higher:
# Classification_res <- DaMiR.EnsembleLearning(selected_features,
# classes=df$class, fSample.tr=0.6, fSample.tr.w=0.6, iter=3,
# cl_type=c("RF","kNN"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.