qreg_lightgbm: Multiple Quantile Regression Using Gradient Boosted Decision...

View source: R/MQR_lightgbm.R

qreg_lightgbmR Documentation

Multiple Quantile Regression Using Gradient Boosted Decision Trees (lightgbm implementation)

Description

This function fits multiple boosted quantile regression trees using lightgbm with facilities for cross-validation.

Usage

qreg_lightgbm(
  data,
  formula,
  categoric_features = NULL,
  quantiles = c(0.25, 0.5, 0.75),
  cv_folds = NULL,
  cores = 1,
  pckgs = NULL,
  sort = TRUE,
  sort_limits = NULL,
  only_mqr = FALSE,
  exclude_train = NULL,
  lightgbm_params = NULL,
  ...
)

Arguments

data

A data.frame containing target and explanatory variables.

formula

A formula object with the response on the left of an ~ operator, and the terms, separated by + operators, on the right. NOTE any manipulation of terms, eg squaring or interactions, within the formula will fail - only individual, linear terms may be specified.

categoric_features

Either a character vector of feature names, or integer vector of indices, for any categoric terms (NULL if not categoric features included).

quantiles

The quantiles to fit models for.

cv_folds

Control for cross-validation with various options, either:

  • the column name of the fold index supplied in data. Observations and inputs in the index labelled "Test" will serve as test data and held out in model training.

  • an integer giving the number of cross validation folds to generate. Folds are constructed as block chunks. Default behaviour is 5 folds.

  • vector of length==nrow(data) containing character or numeric fold labels.

  • NULL indicates that no cross validation should be performed and the returned model is trained on all data.

cores

the number of available cores. Defaults to one, i.e. no parallelisation, although in this case the user must still specify pckgs if applicable.

pckgs

specify additional packages required for each worker (e.g. c("data.table") if data stored as such).

sort

Sort quantiles using SortQuantiles()?

sort_limits

Limits argument to be passed to SortQuantiles(). Constrains quantiles to upper and lower limits given by list(U=upperlim,L=lowerlim).

only_mqr

return only the out-of-sample predictions?

exclude_train

control for exclusion of rows in data for the model training only, with various options, either:

  • the column name of the binary/boolean exclude flag supplied in data.

  • a vector of binary/boolean exclusion flags of length nrow(data)

  • NULL indicates no exclusion

This option is useful when out-of-sample predictions are required in rows which need excluded during model training

lightgbm_params

Additional arguments passed to lightgbm(): objective='quantile' and the probability level are automatically included so do not need to be specified here.

...

Additional arguments - not currently used.

Details

The returned predictive quantiles are those produced out-of-sample for each cross-validation fold (using models trained on the remaining folds but not "Test" data). Predictive quantiles corresponding to "Test" data are produced using models trained on all non-test data.

The returned models are in a named list corresponding to the model for each fold and and can be extracted for further prediction or evaluation. See predict.qreg_lightgbm().

Value

by default a named list containing fitted models as a list of qreg_lightgbm objects, and out-of-sample cross validation forecasts as an MultiQR object. The output list depends on cv_folds.

Alternatively returns only the out-of-sample cross validation forecasts as an MultiQR object when only_mqr is TRUE

Author(s)

Gordon McFadzean, gordon.mcfadzean@tneigroup.com; Rosemary Tawn, rosemary.tawn@tneigroup.com


jbrowell/ProbCast documentation built on July 20, 2024, 1:53 p.m.