Description Usage Arguments Details Value Author(s) Examples
This function serves two purposes. First, it computes the cross-validation
error across a grid of gbm
metaparameters input by the user, allowing
the model to be easily tuned for a given problem. This process can be
executed in parallel on linux-based machines. Second, it alleviates the
burden of selecting a maximum number of trees for a given set of
metaparameters by allowing the algorithm to run until the best
number of trees has been selected according to the cross-validation error, in
contrast to the standard approach to gbm
, in which a maximum
n.trees
must also be tuned. This allows users to avoid two
types of problem associated with an inappropriate selection for
n.trees
: (1) failing to specify enough trees and therefore using a
sub-optimal model, and (2) specifying far more trees than are necessary,
therefore making the gbm
run for far more time than necessary.
1 2 3 4 5 |
x |
A n x p matrix or data frame of predictors. |
y |
A n x 1 matrix or vector corresponding to the observed outcome. |
distribution |
The distribution to use when fitting each |
cv.folds |
Number of cross-validation folds to perform. |
fit.best |
Logical variable indicating whether or not the best set of
metaparameters (estimated according to cross-validation error) will be
utilized to fit and return a |
nt.start |
Initial number of trees used to model y. |
nt.inc |
Number of trees incrementally added until the cross-validation
error is minimized or until |
verbose |
If TRUE, then |
w |
a vector of weights of the same length as y. NOTE: to evaluate the effect of different weight vectors, a list can be passed to w in which each element follows the structure described above. |
var.monotone |
an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome. NOTE: to evaluate the effect of different monotonicity constraints, a list can be passed to var.monotone in which each element follows the structure described above. |
interaction.depth |
The maximum depth of variable interactions: 1 builds an additive model, 2 builds a model with up to two-way interactions, etc. NOTE: Multiple values can be passed in a vector to evaluate the cross-validation error using multiple interaction depths. |
n.minobsinnode |
The minimum number of observations (not total weights) in the terminal nodes of the trees. NOTE: Multiple values can be passed in a vector to evaluate the cross-validation error using multiple minimum node sizes. |
shrinkage |
A shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction. NOTE: Multiple values can be passed in a vector to evaluate the cross-validation error using multiple shrinkage penalties. |
bag.fraction |
The fraction of independent training observations (or patients) randomly selected to propose the next tree in the expansion, depending on the obs.id vector multiple training data rows may belong to a single 'patient'. This introduces randomness into the model fit. NOTE: Multiple values can be passed in a vector to evaluate the cross-validation error using multiple bag fractions. |
n.cores |
Number of cores that will be used to estimate cross-validation folds in parallel. Only available on linux-based machines. |
max.time |
Maximum number of seconds that the model will continue adding trees for a given set of metaparameters. This optional argument allows users to find the best possible solution in scenarios characterized by limited computational resources. |
seed |
Seed that will guarantee |
The main output of gbm.cverr
is a data frame with rows corresponding
to sets of metaparameters and columns corresponding to (1) the values
defining each set of metaparameters, (2) the minimum cross-validation error
corresponding to each row, and (3) the number of trees that yielded the
minimum cross-validation error in each row. These results are intended to
allow users to make informed decisions about the metaparameters passed to
gbm
when fitting the model that will be interpreted and/or used for
prediction in the future.
Note that the metaparamter values passed to w
, var.monotone
,
interaction.depth
, n.minobsinnode
, shrinkage
, and
bag.fraction
will be fully crossed and evaluated.
An object with 2-5 elements and a summary function. The elements
of gbm.cverr.res
are,
gbm.fit |
If |
w |
List of the optional weight vectors provided by the user. Will not
be returned if |
var.montone |
List of the optional monotonoicity parameters proivided
by the user. Will not be returned if |
cv.err |
A list with length corresponding to the number of metaparameter
combinations that were evaluated by |
res |
A data frame with ten columns and as many rows as there were
unique combinations of metaparameters. This data frame is the basis of the
summary function for |
Calling summary(gbm.cverr.res)
produces a data frame with rows
corresponding to sets of metaparameters and columns that denote for each row,
min.cv.error |
Minimum cross-validation error resulting from the given set of metaparameters. |
w.index |
The index of the (optional) list of weight vectors
corresponding to the given set of metaparameters. This will be omitted if
a list of weights was not provided to |
var.monotone.index |
The index of the (optional) list of monotonicity
vectors corresponding to the given set of metaparameters. This will be
omitted if a list of weights was not provided to |
interaction.depth |
The interaction depth corresponding to the given set of metaparameters. |
n.minobsinnode |
Minimum number of observations in the terminal nodes of the trees for the given set of metaparameters. |
shrinkage |
The shrinkage parameter corresponding to the given set of metaparameters. |
bag.fraction |
The fraction of independent training observations randomly selected to propose the next tree corresponding to the given set of metaparameters. |
n.trees |
The optimum number of trees to utilize given the set of
metaprameters denoted in the row. Note that entries in this column will be
marked with '>=' if the boosting procedure was terminated due to time running
out for this set of metaparameters, determined by the user-specified
|
In the summary object and output, sets of metaparameters (rows) are ordered from best (top row) to worst (last row) in terms of the resulting cross-validation error.
Daniel B. McArtor (dmcartor@nd.edu)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | data(wellbeing)
y <- wellbeing[,25]
x <- wellbeing[,1:20]
mm <- gbm.cverr(x = x, y = y,
distribution = 'gaussian',
cv.folds = 2,
nt.start = 100,
nt.inc = 100,
max.time = 1,
seed = 12345,
interaction.depth = c(1, 5),
shrinkage = 0.01,
n.minobsinnode = c(5, 50),
verbose = TRUE)
summary(mm)
# Investigate gbm results based on the best set of metaparameters
mm$gbm.fit
summary(mm$gbm.fit)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.