DaMiR.EnsembleLearning2cl: Build a Binary Classifier using 'Staking' Learning strategy.
In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification

Description Usage Arguments Details Value Author(s) Examples

This function implements a 'Stacking' ensemble learning strategy. Users can provide heterogeneous features (other than genomic features) which will be taken into account during classification model building. A 'two-classes' classification task is addressed.

DaMiR.EnsembleLearning2cl(
  data,
  classes,
  variables,
  fSample.tr = 0.7,
  fSample.tr.w = 0.7,
  iter = 100,
  cl_type = c("RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS")
)

`data`	A transposed data frame of normalized expression data. Rows and Cols should be, respectively, observations and features
`classes`	A class vector with `nrow(data)` elements. Each element represents the class label for each observation. Two different class labels are allowed
`variables`	An optional data frame containing other variables (but without 'class' column). Each column represents a different covariate to be considered in the model
`fSample.tr`	Fraction of samples to be used as training set; default is 0.7
`fSample.tr.w`	Fraction of samples of training set to be used during weight estimation; default is 0.7
`iter`	Number of iterations to assess classification accuracy; default is 100
`cl_type`	List of weak classifiers that will compose the meta-learners. "RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS" are allowed. Default is c("RF", "LR", "kNN", "LDA", "NB", "SVM")

To assess the robustness of a set of predictors, a specific 'Stacking' strategy has been implemented. First, a training set (TR1) and a test set (TS1) are generated by 'bootstrap' sampling. Then, sampling again from TR1 subset, another pair of training (TR2) and test set (TS2) are obtained. TR2 is used to train Random Forest (RF), Naive Bayes (NB), Support Vector Machines (SVM), k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA) and Logistic Regression (LR) classifiers, whereas TS2 is used to test their accuracy and to calculate weights. The decision rule of 'Stacking' classifier is made by a linear combination of the product between weigths (w) and predictions (Pr) of each classifier; for each sample k, the prediction is computed by:

Pr_{k, Ensemble} = w_{RF} * Pr_{k, RF} + w_{NB} * Pr_{k, NB} + w_{SVM} * Pr_{k, SVM} + w_{k, kNN} * Pr_{k, kNN} + w_{k, LDA} * Pr_{k, LDA} + w_{k, LR} * Pr_{k, LR}

Performance of 'Stacking' classifier is evaluated by using TS1. This process is repeated several times (default 100 times).

A list containing:

A matrix of accuracies of each classifier in each iteration.
A matrix of weights used for each classifier in each iteration.
A list of all models generated in each iteration.
A violin plot of model accuracy obtained for each iteration.

Mattia Chiesa, Luca Piacentini

# use example data:
data(selected_features)
data(df)
set.seed(1)
# only for the example:
# speed up the process setting a low 'iter' argument value;
# for real data set use default 'iter' value (i.e. 100) or higher:
#  Classification_res <- DaMiR.EnsembleLearning(selected_features,
# classes=df$class, fSample.tr=0.6, fSample.tr.w=0.6, iter=3,
# cl_type=c("RF","kNN"))

BioinfoMonzino/DaMiRseq documentation built on Aug. 22, 2021, 3:11 p.m.

BioinfoMonzino/DaMiRseq index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BioinfoMonzino/DaMiRseq
Data Mining for RNA-seq data: normalization, feature selection and classification

DaMiR.EnsembleLearning2cl: Build a Binary Classifier using 'Staking' Learning strategy.
In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to DaMiR.EnsembleLearning2cl in BioinfoMonzino/DaMiRseq...

R Package Documentation

Browse R Packages

We want your feedback!

BioinfoMonzino/DaMiRseq Data Mining for RNA-seq data: normalization, feature selection and classification

DaMiR.EnsembleLearning2cl: Build a Binary Classifier using 'Staking' Learning strategy. In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to DaMiR.EnsembleLearning2cl in BioinfoMonzino/DaMiRseq...

R Package Documentation

Browse R Packages

We want your feedback!

BioinfoMonzino/DaMiRseq
Data Mining for RNA-seq data: normalization, feature selection and classification

DaMiR.EnsembleLearning2cl: Build a Binary Classifier using 'Staking' Learning strategy.
In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification