xgbstack: Fit a stacking model given a measure of performance for each...

Description Usage Arguments Value Author(s)

Description

Fit a stacking model given a measure of performance for each component model on a set of training data, and a set of covariates to use in forming component model weights

An implementation of model stacking using xgboost.

Usage

1
2
3
4
5
6
xgbstack(formula, data, booster = "gbtree", subsample = 1,
  colsample_bytree = 1, colsample_bylevel = 1, max_depth = 6,
  min_child_weight = -10^10, eta = 0.3, gamma = 0, lambda = 0,
  alpha = 0, nrounds = 10, cv_params = NULL, cv_folds = NULL,
  cv_nfolds = 10L, cv_refit = "ttest", update = NULL, nthread = NULL,
  verbose = 0)

Arguments

formula

a formula describing the model fit. left hand side should give columns in data with scores of models, separated by +. right hand side should specify explanatory variables on which weights will depend.

data

a data frame with variables in formula

booster

what form of boosting to use? see xgboost documentation

subsample

fraction of data to use in bagging. not supported yet.

colsample_bytree

fraction of explanatory variables to randomly select in growing each regression tree. see xgboost documentation

colsample_bylevel

fraction of explanatory variables to randomly select in growing each level of the regression tree. see xgboost documentation

max_depth

maximum depth of regression trees. see xgboost documentation

min_child_weight

not recommended for use. see xgboost documentation

eta

learning rate. see xgboost documentation

gamma

Penalty on number of regression tree leafs. see xgboost documentation

lambda

L2 regularization of contribution to model weights in each round. see xgboost documentation

alpha

L1 regularization of contribution to model weights in each round. see xgboost documentation

nrounds

see xgboost documentation

cv_params

optional named list of parameter values to evaluate loss via cross-validation. Each component is a vector of parameter values with name one of "booster", "subsample", "colsample_bytree", "colsample_bylevel", "max_depth", "min_child_weight", "eta", "gamma", "lambda", "alpha", "nrounds"

cv_folds

list specifying observation groups to use in cross-validation each list component is a numeric vector of observation indices.

cv_nfolds

integer specifying the number of cross-validation folds to use. if cv_folds was provided, cv_nfolds is ignored. if cv_folds was not provided, the data will be randomly partitioned into cv_nfolds groups

cv_refit

character describing which of the models specified by the values in cv_params to refit using the full data set. Either "best", "ttest", or "none".

update

an object of class xgbstack to update

nthread

number of threads to use

verbose

how much output to generate along the way. 0 for no logging, 1 for some logging

Value

an estimated xgbstack object, which contains a gradient tree boosted fit mapping observed variables to component model weights

Author(s)

Evan Ray <elray@umass.edu>


reichlab/xgbstack documentation built on May 27, 2019, 4:54 a.m.