Description Usage Arguments Value Author(s)
Fit a stacking model given a measure of performance for each component model on a set of training data, and a set of covariates to use in forming component model weights
An implementation of model stacking using xgboost.
1 2 3 4 5 6 | xgbstack(formula, data, booster = "gbtree", subsample = 1,
colsample_bytree = 1, colsample_bylevel = 1, max_depth = 6,
min_child_weight = -10^10, eta = 0.3, gamma = 0, lambda = 0,
alpha = 0, nrounds = 10, cv_params = NULL, cv_folds = NULL,
cv_nfolds = 10L, cv_refit = "ttest", update = NULL, nthread = NULL,
verbose = 0)
|
formula |
a formula describing the model fit. left hand side should give columns in data with scores of models, separated by +. right hand side should specify explanatory variables on which weights will depend. |
data |
a data frame with variables in formula |
booster |
what form of boosting to use? see xgboost documentation |
subsample |
fraction of data to use in bagging. not supported yet. |
colsample_bytree |
fraction of explanatory variables to randomly select in growing each regression tree. see xgboost documentation |
colsample_bylevel |
fraction of explanatory variables to randomly select in growing each level of the regression tree. see xgboost documentation |
max_depth |
maximum depth of regression trees. see xgboost documentation |
min_child_weight |
not recommended for use. see xgboost documentation |
eta |
learning rate. see xgboost documentation |
gamma |
Penalty on number of regression tree leafs. see xgboost documentation |
lambda |
L2 regularization of contribution to model weights in each round. see xgboost documentation |
alpha |
L1 regularization of contribution to model weights in each round. see xgboost documentation |
nrounds |
see xgboost documentation |
cv_params |
optional named list of parameter values to evaluate loss via cross-validation. Each component is a vector of parameter values with name one of "booster", "subsample", "colsample_bytree", "colsample_bylevel", "max_depth", "min_child_weight", "eta", "gamma", "lambda", "alpha", "nrounds" |
cv_folds |
list specifying observation groups to use in cross-validation each list component is a numeric vector of observation indices. |
cv_nfolds |
integer specifying the number of cross-validation folds to use. if cv_folds was provided, cv_nfolds is ignored. if cv_folds was not provided, the data will be randomly partitioned into cv_nfolds groups |
cv_refit |
character describing which of the models specified by the values in cv_params to refit using the full data set. Either "best", "ttest", or "none". |
update |
an object of class xgbstack to update |
nthread |
number of threads to use |
verbose |
how much output to generate along the way. 0 for no logging, 1 for some logging |
an estimated xgbstack object, which contains a gradient tree boosted fit mapping observed variables to component model weights
Evan Ray <elray@umass.edu>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.