qreg_gam: Multiple Quantile Regression Using Generalised Additive...

View source: R/MQR_qreg_gam.R

qreg_gamR Documentation

Multiple Quantile Regression Using Generalised Additive Models and Linear Quantile Regression

Description

This function fits multiple conditional linear quantile regression models to the residuals of a generalised additive model using mgcv and with facilities for cross-validation. Quantile regression may be performed using user-specified formula or the design matrix of the fitted GAM.

Usage

qreg_gam(
  data,
  formula,
  formula_qr = NULL,
  model_res2 = F,
  formula_res2 = NULL,
  quantiles = c(0.25, 0.5, 0.75),
  cv_folds = NULL,
  use_bam = T,
  exclude_train = NULL,
  sort = T,
  sort_limits = NULL,
  ...
)

Arguments

data

A data.frame containing target and explanatory variables. May optionally contain a column called "kfold" with numbered/labelled folds and "Test" for test data.

formula_qr

Formula for linear quantile regression model for GAM residuals. Term gam_pred is the prediction from the above GAM may be included in this formula. If null, the "terms" of the GAM model are used as features in linear quantile regression.

model_res2

If TRUE also model squared residuals of GAM using a GAM. Defaults to FALSE.

formula_res2

Formula for GAM to predict squared residuals.

quantiles

The quantiles to fit models for.

cv_folds

Control for cross-validation with various options, either:

  • the column name of the fold index supplied in data. Observations and inputs in the index labelled "Test" will serve as test data and held out in model training.

  • an integer giving the number of cross validation folds to generate. Folds are constructed as block chunks. Default behaviour is 5 folds.

  • NULL indicates that no cross validation should be performed and the returned model is trained on all data.

use_bam

If TRUE (default) then GAM is fit using (bam()) in stead of gam(). bam is better suited to large datasets but not all gam model options are available with bam. Alternative smooths, such as cubic regression splines aid faster estimation than the default smooth bs="tp". See bam() documentation for further details.

exclude_train

A column name in data indicating if a row should be excluded from model training, i.e. if it contains bad data (will be coerced to logical). Alterntively, an integer or logical vector with length equal to the number of rows in data indicating the same. Rows labeled TRUE are excluded from model training.

sort

boolean Sort quantiles using SortQuantiles()?

sort_limits

Limits argument to be passed to SortQuantiles(). Constrains quantiles to upper and lower limits given by list(U=upperlim,L=lowerlim).

...

Additional arguments past to gam() (or bam()).

formala

A formula object with the response on the left of an ~ operator, and the terms, separated by + operators, on the right passed to gam() or bam() from mgcv.

Details

The returned predictive quantiles and GAM predictions are those produced out-of-sample for each cross-validation fold (using models trained on the remaining folds but not "Test" data). Predictive quantiles corresponding to "Test" data are produced using models trained on all non-test data.

Value

Returns a list containing predictive quantiles (in a MultiQR object), GAM models, and deterministic predictions from GAMs.

Author(s)

Jethro Browell, jethro.browell@glasgow.ac.uk


jbrowell/ProbCast documentation built on July 20, 2024, 1:53 p.m.