gbmCMA: Tree-based Gradient Boosting

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

Roughly speaking, Boosting combines 'weak learners' in a weighted manner in a stronger ensemble. This method calls the function gbm.fit from the package gbm. The 'weak learners' are simple trees that need only very few splits (default: 1).

For S4 method information, see gbmCMA-methods.

Usage

1
gbmCMA(X, y, f, learnind, models=FALSE,...)

Arguments

X

Gene expression data. Can be one of the following:

  • A matrix. Rows correspond to observations, columns to variables.

  • A data.frame, when f is not missing (s. below).

  • An object of class ExpressionSet.

y

Class labels. Can be one of the following:

  • A numeric vector.

  • A factor.

  • A character if X is an ExpressionSet that specifies the phenotype variable.

  • missing, if X is a data.frame and a proper formula f is provided.

WARNING: The class labels will be re-coded to range from 0 to K-1, where K is the total number of different classes in the learning set.

f

A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.

learnind

An index vector specifying the observations that belong to the learning set. May be missing; in that case, the learning set consists of all observations and predictions are made on the learning set.

models

a logical value indicating whether the model object shall be returned

...

Further arguments passed to the function gbm.fit from the package of the same name. Worth mentionning are

ntrees

Number of trees to fit (size of the ensemble), defaults to 100. This parameter should be optimized using tune.

shrinkage

The learning rate (default is 0.001). Usually fixed to a very low value.

distribution

Loss function to be used. Default is "bernoulli", i.e. LogitBoost, a (less robust) alternative is "adaboost".

interaction.depth

Number of splits used by the 'weak learner' (single decision tree). Default is 1.

Value

An onject of class cloutput.

Note

Up to now, this method can only be applied to binary classification.

Author(s)

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

References

Ridgeway, G. (1999).

The state of boosting.

Computing Science and Statistics, 31:172-181

Friedman, J. (2001).

Greedy Function Approximation: A Gradient Boosting Machine.

Annals of Statistics 29(5):1189-1232.

See Also

compBoostCMA, dldaCMA, ElasticNetCMA, fdaCMA, flexdaCMA, knnCMA, ldaCMA, LassoCMA, nnetCMA, pknnCMA, plrCMA, pls_ldaCMA, pls_lrCMA, pls_rfCMA, pnnCMA, qdaCMA, rfCMA, scdaCMA, shrinkldaCMA, svmCMA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
### load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression
golubX <- as.matrix(golub[,-1])
### select learningset
ratio <- 2/3
set.seed(111)
learnind <- sample(length(golubY), size=floor(ratio*length(golubY)))
### run tree-based gradient boosting (no tuning)
gbmresult <- gbmCMA(X=golubX, y=golubY, learnind=learnind, n.trees = 500)
show(gbmresult)
ftable(gbmresult)
plot(gbmresult)

CMA documentation built on Nov. 8, 2020, 5:02 p.m.