Generalized Boosted Regression Modeling
Description
Fits generalized boosted regression models.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42  gbm(formula = formula(data),
distribution = "bernoulli",
data = list(),
weights,
var.monotone = NULL,
n.trees = 100,
interaction.depth = 1,
n.minobsinnode = 10,
shrinkage = 0.001,
bag.fraction = 0.5,
train.fraction = 1.0,
cv.folds=0,
keep.data = TRUE,
verbose = "CV",
class.stratify.cv=NULL,
n.cores = NULL)
gbm.fit(x, y,
offset = NULL,
misc = NULL,
distribution = "bernoulli",
w = NULL,
var.monotone = NULL,
n.trees = 100,
interaction.depth = 1,
n.minobsinnode = 10,
shrinkage = 0.001,
bag.fraction = 0.5,
nTrain = NULL,
train.fraction = NULL,
keep.data = TRUE,
verbose = TRUE,
var.names = NULL,
response.name = "y",
group = NULL)
gbm.more(object,
n.new.trees = 100,
data = NULL,
weights = NULL,
offset = NULL,
verbose = NULL)

Arguments
formula 
a symbolic description of the model to be fit. The formula may include an offset term (e.g. y~offset(n)+x). If 
distribution 
either a character string specifying the name of the distribution to use or a list with a component Currently available options are "gaussian" (squared error), "laplace" (absolute loss), "tdist" (tdistribution loss), "bernoulli" (logistic regression for 01 outcomes), "huberized" (huberized hinge loss for 01 outcomes), "multinomial" (classification when there are more than 2 classes), "adaboost" (the AdaBoost exponential loss for 01 outcomes), "poisson" (count outcomes), "coxph" (right censored observations), "quantile", or "pairwise" (ranking measure using the LambdaMart algorithm). If quantile regression is specified, If "tdist" is specified, the default degrees of freedom is 4 and this can be controlled by specifying If "pairwise" regression is specified,
Note that splitting of instances into training and validation sets
follows group boundaries and therefore only approximates the specified
Weights can be used in conjunction with pairwise metrics, however it is assumed that they are constant for instances from the same group. For details and background on the algorithm, see e.g. Burges (2010). 
data 
an optional data frame containing the variables in the model. By default the variables are taken from 
weights 
an optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. If 
var.monotone 
an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (1), or arbitrary (0) relationship with the outcome. 
n.trees 
the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. 
cv.folds 
Number of crossvalidation folds to perform. If 
interaction.depth 
The maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2way interactions, etc. 
n.minobsinnode 
minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations not the total weight. 
shrinkage 
a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or stepsize reduction. 
bag.fraction 
the fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. If 
train.fraction 
The first 
nTrain 
An integer representing the number of cases on which to
train. This is the preferred way of specification for 
keep.data 
a logical variable indicating whether to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to 
object 
a 
n.new.trees 
the number of additional trees to add to 
verbose 
If TRUE, gbm will print out progress and performance indicators. If this option is left unspecified for gbm.more then it uses 
class.stratify.cv 
whether or not the crossvalidation should be stratified by class. Defaults to 
x, y 
For 
offset 
a vector of values for the offset 
misc 
For 
w 
For 
var.names 
For 
response.name 
For 
group 

n.cores 
The number of CPU cores to use. The crossvalidation loop
will attempt to send different CV folds off to different cores. If

Details
See the gbm vignette for technical details.
This package implements the generalized boosted modeling framework. Boosting is the process of iteratively adding basis functions in a greedy fashion so that each additional basis function further reduces the selected loss function. This implementation closely follows Friedman's Gradient Boosting Machine (Friedman, 2001).
In addition to many of the features documented in the Gradient Boosting Machine, gbm
offers additional features including the outofbag estimator for the optimal number of iterations, the ability to store and manipulate the resulting gbm
object, and a variety of other loss functions that had not previously had associated boosting algorithms, including the Cox partial likelihood for censored data, the poisson likelihood for count outcomes, and a gradient boosting implementation to minimize the AdaBoost exponential loss function.
gbm.fit
provides the link between R and the C++ gbm engine. gbm
is a frontend to gbm.fit
that uses the familiar R modeling formulas. However, model.frame
is very slow if there are many predictor variables. For powerusers with many variables use gbm.fit
. For general practice gbm
is preferable.
Value
gbm
, gbm.fit
, and gbm.more
return a gbm.object
.
Author(s)
Greg Ridgeway gregridgeway@gmail.com
Quantile regression code developed by Brian Kriegler bk@stat.ucla.edu
tdistribution, and multinomial code developed by Harry Southworth and Daniel Edwards
Pairwise code developed by Stefan Schroedl schroedl@a9.com
References
Y. Freund and R.E. Schapire (1997) “A decisiontheoretic generalization of online learning and an application to boosting,” Journal of Computer and System Sciences, 55(1):119139.
G. Ridgeway (1999). “The state of boosting,” Computing Science and Statistics 31:172181.
J.H. Friedman, T. Hastie, R. Tibshirani (2000). “Additive Logistic Regression: a Statistical View of Boosting,” Annals of Statistics 28(2):337374.
J.H. Friedman (2001). “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics 29(5):11891232.
J.H. Friedman (2002). “Stochastic Gradient Boosting,” Computational Statistics and Data Analysis 38(4):367378.
B. Kriegler (2007). CostSensitive Stochastic Gradient Boosting Within a Quantitative Regression Framework. PhD dissertation, UCLA Statistics.
C. Burges (2010). “From RankNet to LambdaRank to LambdaMART: An Overview,” Microsoft Research Technical Report MSRTR201082.
The MART website.
See Also
gbm.object
, gbm.perf
, plot.gbm
,
predict.gbm
, summary.gbm
, pretty.gbm.tree
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109  # A least squares regression example # create some data
N < 1000
X1 < runif(N)
X2 < 2*runif(N)
X3 < ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
X4 < factor(sample(letters[1:6],N,replace=TRUE))
X5 < factor(sample(letters[1:3],N,replace=TRUE))
X6 < 3*runif(N)
mu < c(1,0,1,2)[as.numeric(X3)]
SNR < 10 # signaltonoise ratio
Y < X1**1.5 + 2 * (X2**.5) + mu
sigma < sqrt(var(Y)/SNR)
Y < Y + rnorm(N,0,sigma)
# introduce some missing values
X1[sample(1:N,size=500)] < NA
X4[sample(1:N,size=300)] < NA
data < data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)
# fit initial model
gbm1 <
gbm(Y~X1+X2+X3+X4+X5+X6, # formula
data=data, # dataset
var.monotone=c(0,0,0,0,0,0), # 1: monotone decrease,
# +1: monotone increase,
# 0: no monotone restrictions
distribution="gaussian", # see the help for other choices
n.trees=1000, # number of trees
shrinkage=0.05, # shrinkage or learning rate,
# 0.001 to 0.1 usually work
interaction.depth=3, # 1: additive model, 2: twoway interactions, etc.
bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best
train.fraction = 0.5, # fraction of data for training,
# first train.fraction*N used for training
n.minobsinnode = 10, # minimum total weight needed in each node
cv.folds = 3, # do 3fold crossvalidation
keep.data=TRUE, # keep a copy of the dataset with the object
verbose=FALSE, # don't print out progress
n.cores=1) # use only a single core (detecting #cores is
# errorprone, so avoided here)
# check performance using an outofbag estimator
# OOB underestimates the optimal number of iterations
best.iter < gbm.perf(gbm1,method="OOB")
print(best.iter)
# check performance using a 50% heldout test set
best.iter < gbm.perf(gbm1,method="test")
print(best.iter)
# check performance using 5fold crossvalidation
best.iter < gbm.perf(gbm1,method="cv")
print(best.iter)
# plot the performance # plot variable influence
summary(gbm1,n.trees=1) # based on the first tree
summary(gbm1,n.trees=best.iter) # based on the estimated best number of trees
# compactly print the first and last trees for curiosity
print(pretty.gbm.tree(gbm1,1))
print(pretty.gbm.tree(gbm1,gbm1$n.trees))
# make some new data
N < 1000
X1 < runif(N)
X2 < 2*runif(N)
X3 < ordered(sample(letters[1:4],N,replace=TRUE))
X4 < factor(sample(letters[1:6],N,replace=TRUE))
X5 < factor(sample(letters[1:3],N,replace=TRUE))
X6 < 3*runif(N)
mu < c(1,0,1,2)[as.numeric(X3)]
Y < X1**1.5 + 2 * (X2**.5) + mu + rnorm(N,0,sigma)
data2 < data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)
# predict on the new data using "best" number of trees
# f.predict generally will be on the canonical scale (logit,log,etc.)
f.predict < predict(gbm1,data2,best.iter)
# least squares error
print(sum((data2$Yf.predict)^2))
# create marginal plots
# plot variable X1,X2,X3 after "best" iterations
par(mfrow=c(1,3))
plot(gbm1,1,best.iter)
plot(gbm1,2,best.iter)
plot(gbm1,3,best.iter)
par(mfrow=c(1,1))
# contour plot of variables 1 and 2 after "best" iterations
plot(gbm1,1:2,best.iter)
# lattice plot of variables 2 and 3
plot(gbm1,2:3,best.iter)
# lattice plot of variables 3 and 4
plot(gbm1,3:4,best.iter)
# 3way plots
plot(gbm1,c(1,2,6),best.iter,cont=20)
plot(gbm1,1:3,best.iter)
plot(gbm1,2:4,best.iter)
plot(gbm1,3:5,best.iter)
# do another 100 iterations
gbm2 < gbm.more(gbm1,100,
verbose=FALSE) # stop printing detailed progress
