Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/MiBiClassGBODT.R

This function conducts a binary classification of specimens based on microarray gene (transcript) expression data. Gradient boosting over desicion trees algorithm is used. Several generalized boosted regression models are fitted during cross-validation, for each model measurements of classification quality and feature importance are returned.

1 2 | ```
MiBiClassGBODT(Matrix, specimens, n.crossval = 5, ntrees = 10000,
shrinkage = 0.1, intdepth = 2, n.terminal = 10, bag.frac = 0.5)
``` |

`Matrix` |
numeric matrix of expression data where each row corresponds to a probe (gene, transcript), and each column correspondes to a specimen (patient). |

`specimens` |
factor vector with two levels specifying specimens in the columns of the |

`n.crossval` |
integer specifying number of cross-validation folds. |

`ntrees` |
integer specifying the total number of decision trees (boosting iterations). |

`shrinkage` |
numeric specifying the learning rate. Scales the step size in the gradient descent procedure. |

`intdepth` |
integer specifying the maximum depth of each tree. |

`n.terminal` |
integer specifying the actual minimum number of observations in the terminal nodes of the trees. |

`bag.frac` |
the fraction of the training set observations randomly selected to propose the next tree in the expansion. |

`Matrix`

must contain specimens from two classification groups only. To sample expression matrix
use `MiDataSample`

.

The order of the variables in `specimens`

and the columns of `Matrix`

must be the same. Levels of
`specimens`

are two classification groups. To sample specimens use `MiSpecimenSample`

.

Number of cross-validation folders defines number of models to be fitted. For example,
if n.crossval=5 then all specimens are divided into 5 folders, each of them is later used for model testing,
so 5 models are fitted. See `createFolds`

for details.

While boosting, basis functions are iteratively adding in a greedy fashion
so that each additional basis function further reduces the selected loss function.
Gaussian distribution (squared error) is used.
`ntrees`

, `shrinkage`

, `intdepth`

are parameters for model tuning.
`bag.frac`

introduces randomnesses into the model fit.
If `bag.frac`

< 1 then running the same model twice will result in similar but different fits.
Number of specimens in train sample must be enough to provide the minimum number of observations in terminal nodes.I.e.

(1-1/`n.crossval`

)*`bag.frac`

> `n.terminal`

.

See `gbm`

for details.

list of 2:

`QC`

- matrix containing quality measures for each fitted model and their summary.
Accur - accuracy (percentage of correct predictions),
AUC - area under ROC curve (see `roc`

),
MCC - Mattew's correlation coefficient

formula ((TP*TN)-(FP*FN))/sqrt((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN)),

F1sc - F1 score

formula 2xPresxRec/(Pres+Rec).

If all the data points from one class are misclassified into other, MCC and F1 score may get NaN values.

`Importance`

- list of data frames containing for each fitted model:
`var`

- probe ID and `rel.inf`

- its feature importance for classification (relative influence).

Feature importance (relative influence) graphs are also produced.

Elena N. Filatova

`createFolds`

, `gbm`

, `MiSpecimenSample`

, `MiDataSample`

,
`roc`

1 2 3 4 5 6 7 8 9 10 11 12 | ```
#get gene expression and specimen data
data("IMexpression");data("IMspecimen")
#sample expression matrix and specimen data for binary classification,
#only "NORM" and "EBV" specimens are left
SampleMatrix<-MiDataSample(IMexpression, IMspecimen$diagnosis,"norm", "ebv")
SampleSpecimen<-MiSpecimenSample(IMspecimen$diagnosis, "norm", "ebv")
#Fitting, low tuning for faster running
BoostRes<-MiBiClassGBODT(SampleMatrix, SampleSpecimen, n.crossval = 3,
ntrees = 10, shrinkage = 1, intdepth = 2)
BoostRes[[1]] # QC values for n.crossval = 3 models and its summary
length(BoostRes[[2]]) # n.crossval = 3 data frames of probes feature importance for classification
head(BoostRes[[2]][[1]])
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.