fit_stacked_model: Fit a stacking model that assigns weights to component models...

Description Usage Arguments Value

View source: R/stacking-utils.R

Description

Fit a stacking model that assigns weights to component models The weights are a function of observed covariates (which?), and are obtained via gradient tree boosting

Usage

1
2
3
4
5
6
7
8
9
fit_stacked_model(prediction_target, component_model_names,
  explanatory_variables = c("analysis_time_season_week",
  "kcde_model_confidence", "sarima_model_confidence", "weighted_ili"),
  loso_preds_path, seasons_to_leave_out, booster = "gbtree", subsample = 1,
  colsample_bytree = 1, colsample_bylevel = 1, max_depth = 10,
  min_child_weight = -10^10, eta = 0.3, gamma = 0, lambda = 0,
  alpha = 0, nrounds = 10, cv_params = NULL, cv_folds = NULL,
  cv_nfolds = 10L, cv_refit = "ttest", update = NULL, nthread = NULL,
  verbose = 0)

Arguments

prediction_target

string with either "onset", "peak_week", "peak_inc", "ph1_inc", ..., "ph4_inc"

component_model_names

character vector with names of component models

explanatory_variables

character vector with names of explanatory variables to include for weights; a non-empty subset of "analysis_time_season_week", "kcde_model_confidence", "sarima_model_confidence", "weighted_ili"

loso_preds_path

path to directory with leave-one-season-out predictions from each component model. Predictions should be saved in files named like "kde-National-loso-predictions.rds"

seasons_to_leave_out

optional character vector of seasons to leave out of stacking estimation

booster

what form of boosting to use? see xgboost documentation

subsample

fraction of data to use in bagging. not supported yet.

colsample_bytree

fraction of explanatory variables to randomly select in growing each regression tree. see xgboost documentation

colsample_bylevel

fraction of explanatory variables to randomly select in growing each level of the regression tree. see xgboost documentation

max_depth

maximum depth of regression trees. see xgboost documentation

min_child_weight

not recommended for use. see xgboost documentation

eta

learning rate. see xgboost documentation

gamma

Penalty on number of regression tree leafs. see xgboost documentation

lambda

L2 regularization of contribution to model weights in each round. see xgboost documentation

alpha

L1 regularization of contribution to model weights in each round. see xgboost documentation

nrounds

see xgboost documentation

cv_params

optional named list of parameter values to evaluate loss via cross-validation. Each component is a vector of parameter values with name one of "booster", "subsample", "colsample_bytree", "colsample_bylevel", "max_depth", "min_child_weight", "eta", "gamma", "lambda", "alpha", "nrounds"

cv_folds

list specifying observation groups to use in cross-validation each list component is a numeric vector of observation indices.

cv_nfolds

integer specifying the number of cross-validation folds to use. if cv_folds was provided, cv_nfolds is ignored. if cv_folds was not provided, the data will be randomly partitioned into cv_nfolds groups

cv_refit

character describing which of the models specified by the values in cv_params to refit using the full data set. Either "best", "ttest", or "none".

update

an object of class xgbstack to update

nthread

how many threads to use. see xgboost documentation

verbose

how much output to generate along the way. 0 for no logging, 1 for some logging

Value

a model stacking fit


reichlab/2017-2018-cdc-flu-contest documentation built on Sept. 25, 2018, 3:24 a.m.