NGeDSgam | R Documentation |
Implements the Local Scoring Algorithm (Hastie and Tibshirani
(1986)), applying normal linear GeD splines (i.e., NGeDS
function) to fit the targets within each backfitting iteration. Higher order
fits are computed by pursuing stage B of GeDS after the local-scoring algorithm
is run.
NGeDSgam(
formula,
family = "gaussian",
data,
weights = NULL,
normalize_data = FALSE,
min_iterations,
max_iterations,
phi_gam_exit = 0.99,
q_gam = 2L,
beta = 0.5,
phi = 0.99,
internal_knots = 500L,
q = 2L,
higher_order = TRUE
)
formula |
A description of the model structure to be fitted,
specifying both the dependent and independent variables. Unlike |
family |
A character string indicating the response variable distribution
and link function to be used. Default is |
data |
A |
weights |
An optional vector of "prior weights" to be put on the
observations during the fitting process. It should be |
normalize_data |
A logical that defines whether the data should be
normalized (standardized) before fitting the baseline linear model, i.e.,
before running the local-scoring algorithm. Normalizing the data involves
scaling the predictor variables to have a mean of 0 and a standard deviation
of 1. This process alters the scale and interpretation of the knots and
coefficients estimated. Default is equal to |
min_iterations |
Optional parameter to manually set a minimum number of local-scoring iterations to be run. If not specified, it defaults to 0L. |
max_iterations |
Optional parameter to manually set the maximum number
of local-scoring iterations to be run. If not specified, it defaults to |
phi_gam_exit |
Convergence threshold for local-scoring and backfitting.
Both algorithms stop when the relative change in the deviance is below this
threshold. Default is |
q_gam |
Numeric parameter which allows to fine-tune the stopping rule of
the local-scoring and backfitting iterations. By default equal to |
beta |
Numeric parameter in the interval |
phi |
Numeric parameter in the interval |
internal_knots |
The maximum number of internal knots that can be added
by the GeDS smoothers at each backfitting iteration, effectively setting the
value of |
q |
Numeric parameter which allows to fine-tune the stopping rule of
stage A of GeDS, for each of the GeD spline components of the model. By
default equal to |
higher_order |
a logical that defines whether to compute the higher order
fits (quadratic and cubic) after the local-scoring algorithm is run. Default
is |
The NGeDSgam
function employs the local scoring algorithm to fit a
generalized additive model (GAM). This algorithm iteratively fits weighted
additive models by backfitting. Normal linear GeD splines, as well as linear
learners, are supported as function smoothers within the backfitting
algorithm. The local-scoring algorithm ultimately produces a linear fit.
Higher order fits (quadratic and cubic) are then computed by calculating the
Schoenberg’s variation diminishing spline (VDS) approximation of the linear
fit.
On the one hand, NGeDSgam
includes all the parameters of
NGeDS
, which in this case tune the function smoother fit at each
backfitting iteration. On the other hand, NGeDSgam
includes some
additional parameters proper to the local-scoring procedure. We describe
the main ones as follows.
The family
chosen determines the link function, adjusted dependent
variable and weights to be used in the local-scoring algorithm. The number of
local-scoring and backfitting iterations is controlled by a
Ratio of Deviances stopping rule similar to the one presented for
NGeDS
/GGeDS
. In the same way phi
and q
tune the stopping rule of NGeDS
/GGeDS
,
phi_gam_exit
and q_gam
tune the stopping rule of NGeDSgam
.
The user can also manually control the number of local-scoring iterations
through min_iterations
and max_iterations
.
A model term wrapped in offset()
is treated as a known (fixed) component
and added directly to the linear predictor when fitting the model. In case
more than one covariate is fixed, the user should sum the corresponding
coordinates of the fixed covariates to produce one common N
-vector of
coordinates. See formula
.
An object of class "GeDSgam"
(a named list) with components:
Call to the NGeDSgam
function.
A formula object representing the model to be fitted.
A list containing the arguments passed to the NGeDSgam
function.
This list includes:
response
data.frame
containing the response variable
observations.
predictors
data.frame
containing the corresponding
observations of the predictor variables included in the model.
base_learners
Description of the model's base learners ("smooth functions").
family
The statistical family. The possible options are:
binomial(link = "logit", "probit", "cauchit", "log", "cloglog")
,
gaussian(link = "identity", "log", "inverse")
,
Gamma(link = "inverse", "identity", "log")
,
inverse.gaussian(link = "1/mu^2", "inverse", "identity", "log")
,
poisson(link = "log", "identity", "sqrt")
,
quasi(link = "identity", variance = "constant")
,
quasibinomial(link = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt")
and
quasipoisson(link = "log", "identity", "sqrt")
.
normalize_data
If TRUE
, then response and predictors
were standardized before running the local-scoring algorithm.
X_mean
Mean of the predictor variables (only if
normalize_data = TRUE
).
X_sd
Standard deviation of the predictors (only if
normalize_data = TRUE
, otherwise this is NULL
).
Y_mean
Mean of the response variable (only if
normalize_data = TRUE
, otherwise this is NULL
).
Y_sd
Standard deviation of the response variable (only if
normalize_data = TRUE
, otherwise this is NULL
).
A list detailing the final "GeDSgam"
model selected after
running the local scoring algorithm. The chosen model minimizes deviance
across all models generated by each local-scoring iteration. This list includes:
model_name
Local-scoring iteration that yielded the "best"
model. Note that when family = "gaussian"
, it will always correspond
to iter1
, as only one local-scoring iteration is conducted in this
scenario. This occurs because, when family = "gaussian"
, the
algorithm is tantamount to directly implementing backfitting.
dev
Deviance of the final model. For family = "gaussian"
this coincides with the Residual Sum of Squares.
Y_hat
Fitted values, including:
- eta
: the additive predictor,
- mu
: the vector of means,
- z
: the adjusted dependent variable.
base_learners
A list containing, for each base-learner, the
corresponding linear fit piecewise polynomial coefficients. It includes the
knots for each order fit, resulting from computing the averaging knot
location. Although if the number of internal knots of the final linear fit
is less than n-1
, the averaging knot location is not computed.
linear.fit
Final linear fit in B-spline form (see SplineReg
).
quadratic.fit
Quadratic fit obtained via Schoenberg
variation diminishing approximation (see SplineReg
).
cubic.fit
Cubic fit obtained via via Schoenberg variation
diminishing approximation (see SplineReg
).
A list containing the predicted values obtained for each of
the fits (linear, quadratic, and cubic). Each of the predictions contains
both the additive predictor eta
and the vector of means mu
.
A list detailing the internal knots obtained for the fits of different order (linear, quadratic, and cubic).
Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models.
Statistical Science 1 (3) 297 - 310.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/ss/1177013604")}
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00180-015-0621-7")}
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.amc.2022.127493")}
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
NGeDS
; GGeDS
; S3 methods such as
coef
, confint
,
deviance.GeDSgam
, family
, formula
,
knots
, logLik
,
predict
, print
,
summary
.
# Load package
library(GeDS)
data(airquality)
data = na.omit(airquality)
data$Ozone <- data$Ozone^(1/3)
formula = Ozone ~ f(Solar.R) + f(Wind, Temp)
Gmodgam <- NGeDSgam(formula = formula, data = data,
phi = 0.8)
MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2)
MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2)
MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2)
cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSgam:", MSE_Gmodgam_linear, "\n",
"Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n",
"Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n")
## S3 methods for class 'GeDSgam'
# Print
print(Gmodgam); summary(Gmodgam)
# Knots
knots(Gmodgam, n = 2)
knots(Gmodgam, n = 3)
knots(Gmodgam, n = 4)
# Coefficients
coef(Gmodgam, n = 2)
coef(Gmodgam, n = 3)
coef(Gmodgam, n = 4)
# Wald-type confidence intervals
confint(Gmodgam, n = 2)
confint(Gmodgam, n = 3)
confint(Gmodgam, n = 4)
# Deviances
deviance(Gmodgam, n = 2)
deviance(Gmodgam, n = 3)
deviance(Gmodgam, n = 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.