NGeDSgam: NGeDSgam: Local Scoring Algorithm with GeD Splines in...

View source: R/NGeDSgam.R

NGeDSgamR Documentation

NGeDSgam: Local Scoring Algorithm with GeD Splines in Backfitting

Description

Implements the Local Scoring Algorithm (Hastie and Tibshirani (1986)), applying normal GeD splines (i.e., NGeDS function) to fit the targets within the backfitting iterations.

Usage

NGeDSgam(
  formula,
  data,
  weights = NULL,
  normalize_data = FALSE,
  family = "gaussian",
  min_iterations,
  max_iterations,
  phi_gam_exit = 0.995,
  q_gam = 2,
  beta = 0.5,
  phi = 0.99,
  internal_knots = 500,
  q = 2,
  higher_order = TRUE
)

Arguments

formula

a description of the structure of the model to be fitted, including the dependent and independent variables. Unlike NGeDS and GGeDS, the formula specified allows for multiple additive GeD spline regression components (as well as linear components) to be included (e.g., Y ~ f(X1) + f(X2) + X3). See formula for further details.

data

a data frame containing the variables referenced in the formula.

weights

an optional vector of ‘prior weights’ to be put on the observations during the fitting process. It should be NULL or a numeric vector of the same length as the response variable defined in the formula.

normalize_data

a logical that defines whether the data should be normalized (standardized) before fitting the baseline linear model, i.e., before running the local-scoring algorithm. Normalizing the data involves scaling the predictor variables to have a mean of 0 and a standard deviation of 1. This process alters the scale and interpretation of the knots and coefficients estimated. Default is equal to FALSE.

family

a character string indicating the response variable distribution and link function to be used. Default is "gaussian". This should be a character or a family object.

min_iterations

optional parameter to manually set a minimum number of boosting iterations to be run. If not specified, it defaults to 0L.

max_iterations

optional parameter to manually set the maximum number of boosting iterations to be run. If not specified, it defaults to 100L. This setting serves as a fallback when the stopping rule, based on consecutive deviances and tuned by phi_gam_exit and q_gam, does not trigger an earlier termination (see Dimitrova et al. (2024)). Therefore, users can increase/decrease the number of boosting iterations, by increasing/decreasing the value phi_gam_exit and/or q_gam, or directly specify max_iterations.

phi_gam_exit

Convergence threshold for local-scoring and backfitting. Both algorithms stop when the relative change in the deviance is below this threshold. Default is 0.995.

q_gam

numeric parameter which allows to fine-tune the stopping rule of local-scoring/backfitting, by default equal to 2L.

beta

numeric parameter in the interval [0,1] tuning the knot placement in stage A of GeDS. Default is equal to 0.5. See details in NGeDS.

phi

numeric parameter in the interval [0,1] specifying the threshold for the stopping rule (model selector) in stage A of GeDS. Default is equal to 0.99. See details in NGeDS.

internal_knots

The maximum number of internal knots that can be added by the GeDS base-learners in each boosting iteration, effectively setting the value of max.intknots in NGeDS at each backfitting iteration. Default is 500L.

q

numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2L. See details in NGeDS.

higher_order

a logical that defines whether to compute the higher order fits (quadratic and cubic) after the local-scoring algorithm is run. Default is TRUE.

Details

The NGeDSgam function employs the local scoring algorithm to fit a Generalized Additive Model (GAM). This algorithm iteratively fits weighted additive models by backfitting. Normal linear GeD splines, as well as linear learners, are supported as function smoothers within the backfitting algorithm. The local-scoring algorithm ultimately produces a linear fit. Higher order fits (quadratic and cubic) are then computed by calculating the Schoenberg’s variation diminishing spline (VDS) approximation of the linear fit.

On the one hand, NGeDSgam includes all the parameters of NGeDS, which in this case tune the smoother fit at each backfitting iteration. On the other hand, NGeDSgam includes some additional parameters proper to the local-scoring procedure. We describe the main ones as follows.

The family chosen determines the link function, adjusted dependent variable and weights to be used in the local-scoring algorithm. The number of local-scoring and backfitting iterations is controlled by a Ratio of Deviances stopping rule similar to the one presented for GGeDS. In the same way phi and q tune the stopping rule of GGeDS, phi_boost_exit and q_boost tune the stopping rule of NGeDSgam. The user can also manually control the number of local-scoring iterations through min_iterations and max_iterations.

Value

GeDSgam-Class object, i.e. a list of items that summarizes the main details of the fitted GAM-GeDS model. See GeDSgam-Class for details. Some S3 methods are available in order to make these objects tractable, such as coef, knots, print and predict.

References

Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models. Statistical Science 1 (3) 297 - 310.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/ss/1177013604")}

Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079–1105.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00180-015-0621-7")}

Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023). Geometrically designed variable knot splines in generalized (non-)linear models. Applied Mathematics and Computation, 436.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.amc.2022.127493")}

Dimitrova, D. S., Guillen, E. S. and Kaishev, V. K. (2024). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.

See Also

NGeDS; GGeDS; GeDSgam-Class; S3 methods such as knots.GeDSgam; coef.GeDSgam; deviance.GeDSgam; predict.GeDSgam

gam, glm

Examples


# Load package
library(GeDS) 

data(airquality) 
data = na.omit(airquality)
data$Ozone <- data$Ozone^(1/3)

formula = Ozone ~ f(Solar.R) + f(Wind, Temp)
Gmodgam <- NGeDSgam(formula = formula, data = data,
phi_gam_exit = 0.995, phi = 0.995, q = 2)
MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2)
MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2)
MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2)

cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSgam:", MSE_Gmodgam_linear, "\n",
"Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n",
"Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n")

## S3 methods for class 'GeDSboost'
# Print 
print(Gmodgam)
# Knots
knots(Gmodgam, n = 2L)
knots(Gmodgam, n = 3L)
knots(Gmodgam, n = 4L)
# Coefficients
coef(Gmodgam, n = 2L)
coef(Gmodgam, n = 3L)
coef(Gmodgam, n = 4L)
# Deviances
deviance(Gmodgam, n = 2L)
deviance(Gmodgam, n = 3L)
deviance(Gmodgam, n = 4L)


alattuada/GeDS documentation built on April 26, 2024, 11:36 a.m.