gbmCMA: Tree-based Gradient Boosting
In CMA: Synthesis of microarray-based classification

Description Usage Arguments Value Note Author(s) References See Also Examples

Roughly speaking, Boosting combines 'weak learners' in a weighted manner in a stronger ensemble. This method calls the function gbm.fit from the package gbm. The 'weak learners' are simple trees that need only very few splits (default: 1).

For S4 method information, see gbmCMA-methods.

1	gbmCMA(X, y, f, learnind, models=FALSE,...)

`X`	Gene expression data. Can be one of the following: A `matrix`. Rows correspond to observations, columns to variables. A `data.frame`, when `f` is not missing (s. below). An object of class `ExpressionSet`.
`y`	Class labels. Can be one of the following: A `numeric` vector. A `factor`. A `character` if `X` is an `ExpressionSet` that specifies the phenotype variable. `missing`, if `X` is a `data.frame` and a proper formula `f` is provided. WARNING: The class labels will be re-coded to range from `0` to `K-1`, where `K` is the total number of different classes in the learning set.
`f`	A two-sided formula, if `X` is a `data.frame`. The left part correspond to class labels, the right to variables.
`learnind`	An index vector specifying the observations that belong to the learning set. May be `missing`; in that case, the learning set consists of all observations and predictions are made on the learning set.
`models`	a logical value indicating whether the model object shall be returned
`...`	Further arguments passed to the function `gbm.fit` from the package of the same name. Worth mentionning are `ntrees` Number of trees to fit (size of the ensemble), defaults to 100. This parameter should be optimized using `tune`. `shrinkage` The learning rate (default is 0.001). Usually fixed to a very low value. `distribution` Loss function to be used. Default is `"bernoulli"`, i.e. `LogitBoost`, a (less robust) alternative is `"adaboost"`. `interaction.depth` Number of splits used by the 'weak learner' (single decision tree). Default is `1`.

An onject of class cloutput.

Up to now, this method can only be applied to binary classification.

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Ridgeway, G. (1999).

The state of boosting.

Computing Science and Statistics, 31:172-181

Friedman, J. (2001).

Greedy Function Approximation: A Gradient Boosting Machine.

Annals of Statistics 29(5):1189-1232.

compBoostCMA, dldaCMA, ElasticNetCMA, fdaCMA, flexdaCMA, knnCMA, ldaCMA, LassoCMA, nnetCMA, pknnCMA, plrCMA, pls_ldaCMA, pls_lrCMA, pls_rfCMA, pnnCMA, qdaCMA, rfCMA, scdaCMA, shrinkldaCMA, svmCMA

### load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression
golubX <- as.matrix(golub[,-1])
### select learningset
ratio <- 2/3
set.seed(111)
learnind <- sample(length(golubY), size=floor(ratio*length(golubY)))
### run tree-based gradient boosting (no tuning)
gbmresult <- gbmCMA(X=golubX, y=golubY, learnind=learnind, n.trees = 500)
show(gbmresult)
ftable(gbmresult)
plot(gbmresult)