DaMiR.EnsembleLearning2cl: Build a Binary Classifier using 'Staking' Learning strategy.

Description Usage Arguments Details Value Author(s) Examples

View source: R/Classif_2_Classes.R

Description

This function implements a 'Stacking' ensemble learning strategy. Users can provide heterogeneous features (other than genomic features) which will be taken into account during classification model building. A 'two-classes' classification task is addressed.

Usage

1
2
3
4
5
6
7
8
9
DaMiR.EnsembleLearning2cl(
  data,
  classes,
  variables,
  fSample.tr = 0.7,
  fSample.tr.w = 0.7,
  iter = 100,
  cl_type = c("RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS")
)

Arguments

data

A transposed data frame of normalized expression data. Rows and Cols should be, respectively, observations and features

classes

A class vector with nrow(data) elements. Each element represents the class label for each observation. Two different class labels are allowed

variables

An optional data frame containing other variables (but without 'class' column). Each column represents a different covariate to be considered in the model

fSample.tr

Fraction of samples to be used as training set; default is 0.7

fSample.tr.w

Fraction of samples of training set to be used during weight estimation; default is 0.7

iter

Number of iterations to assess classification accuracy; default is 100

cl_type

List of weak classifiers that will compose the meta-learners. "RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS" are allowed. Default is c("RF", "LR", "kNN", "LDA", "NB", "SVM")

Details

To assess the robustness of a set of predictors, a specific 'Stacking' strategy has been implemented. First, a training set (TR1) and a test set (TS1) are generated by 'bootstrap' sampling. Then, sampling again from TR1 subset, another pair of training (TR2) and test set (TS2) are obtained. TR2 is used to train Random Forest (RF), Naive Bayes (NB), Support Vector Machines (SVM), k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA) and Logistic Regression (LR) classifiers, whereas TS2 is used to test their accuracy and to calculate weights. The decision rule of 'Stacking' classifier is made by a linear combination of the product between weigths (w) and predictions (Pr) of each classifier; for each sample k, the prediction is computed by:

Pr_{k, Ensemble} = w_{RF} * Pr_{k, RF} + w_{NB} * Pr_{k, NB} + w_{SVM} * Pr_{k, SVM} + w_{k, kNN} * Pr_{k, kNN} + w_{k, LDA} * Pr_{k, LDA} + w_{k, LR} * Pr_{k, LR}

Performance of 'Stacking' classifier is evaluated by using TS1. This process is repeated several times (default 100 times).

Value

A list containing:

Author(s)

Mattia Chiesa, Luca Piacentini

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# use example data:
data(selected_features)
data(df)
set.seed(1)
# only for the example:
# speed up the process setting a low 'iter' argument value;
# for real data set use default 'iter' value (i.e. 100) or higher:
#  Classification_res <- DaMiR.EnsembleLearning(selected_features,
# classes=df$class, fSample.tr=0.6, fSample.tr.w=0.6, iter=3,
# cl_type=c("RF","kNN"))

BioinfoMonzino/DaMiRseq documentation built on Aug. 22, 2021, 3:11 p.m.