NGeDSboost | R Documentation |
NGeDSboost
implements component-wise gradient boosting (Bühlmann and Yu
(2003), Bühlmann and Hothorn (2007)) using normal GeD splines (i.e., fitted
with NGeDS
function) as base-learners (see Dimitrova et al. (2025)).
Differing from standard component-wise boosting, this approach performs a
piecewise polynomial update of the coefficients at each iteration, yielding a
final fit in the form of a single spline model.
NGeDSboost(
formula,
data,
weights = NULL,
normalize_data = FALSE,
family = mboost::Gaussian(),
link = NULL,
initial_learner = TRUE,
int.knots_init = 2L,
min_iterations,
max_iterations,
shrinkage = 1,
phi_boost_exit = 0.99,
q_boost = 2L,
beta = 0.5,
phi = 0.99,
int.knots_boost,
q = 2L,
higher_order = TRUE,
boosting_with_memory = FALSE
)
formula |
A description of the structure of the model to be fitted,
including the dependent and independent variables. Unlike |
data |
A |
weights |
An optional vector of "prior weights" to be put on the
observations during the fitting process. It should be |
normalize_data |
A logical that defines whether the data should be
normalized (standardized) before fitting the baseline FGB-GeDS linear model,
i.e., before running the FGB algorithm. Normalizing the data involves scaling
the predictor variables to have a mean of 0 and a standard deviation of 1. Note
that this process alters the scale and interpretation of the knots and
coefficients estimated. Default is equal to |
family |
Determines the loss function to be optimized by the boosting
algorithm. In case |
link |
A character string specifying the link function to be used,
in case the |
initial_learner |
A logical value. If set to |
int.knots_init |
Optional parameter allowing the user to set a
maximum number of internal knots to be added by the initial GeDS learner in
case |
min_iterations |
Optional parameter to manually set a minimum number of boosting iterations to be run. If not specified, it defaults to 0L. |
max_iterations |
Optional parameter to manually set the maximum number
of boosting iterations to be run. If not specified, it defaults to |
shrinkage |
Numeric parameter in the interval |
phi_boost_exit |
Numeric parameter in the interval |
q_boost |
Numeric parameter which allows to fine-tune the boosting
iterations stopping rule. Default is equal to |
beta |
Numeric parameter in the interval |
phi |
Numeric parameter in the interval |
int.knots_boost |
The maximum number of internal knots that each GeDS base-learner
may have at each boosting iteration, effectively setting the value of
|
q |
Numeric parameter which allows to fine-tune the stopping rule of
stage A of the GeDS base-learner. Default is equal to |
higher_order |
A logical that defines whether to compute the higher
order fits (quadratic and cubic) after the FGB algorithm is run. Default is
|
boosting_with_memory |
Logical value. If |
The NGeDSboost
function implements functional gradient boosting
algorithm for some pre-defined loss function, using linear GeD splines as
base learners. At each boosting iteration, the negative gradient vector is
fitted through the base procedure encapsulated within the NGeDS
function. The latter constructs a geometrically designed variable knots
spline regression model for a response having a normal distribution. The FGB
algorithm yields a final linear fit. Higher order fits (quadratic and cubic)
are then computed by calculating the Schoenberg’s variation diminishing
spline (VDS) approximation of the linear fit.
On the one hand, NGeDSboost
includes all the parameters of
NGeDS
, which in this case tune the base-learner fit at each
boosting iteration. On the other hand, NGeDSboost
includes some
additional parameters proper to the FGB procedure. We describe the main ones
as follows.
First, family
allows to specify the loss function and corresponding
risk function to be optimized by the boosting algorithm. If
initial_learner = FALSE
, the initial learner employed will be the
empirical risk minimizer corresponding to the family chosen. If
initial_learner = TRUE
then the initial learner will be an
NGeDS
fit with maximum number of internal knots equal to
int.knots_init
.
shrinkage
tunes the step length/shrinkage parameter which helps to
control the learning rate of the model. In other words, when a new base
learner is added to the ensemble, its contribution to the final prediction is
multiplied by the shrinkage parameter. The smaller shrinkage
is, the
slower/more gradual the learning process will be, and viceversa.
The number of boosting iterations is controlled by a
Ratio of Deviances stopping rule similar to the one presented for
NGeDS
/GGeDS
. In the same way phi
and q
tune the
stopping rule of NGeDS
GGeDS
, phi_boost_exit
and
q_boost
tune the stopping rule of NGeDSboost
. The user can also
manually control the number of boosting iterations through
min_iterations
and max_iterations
.
An object of class "GeDSboost"
(a named list) with components:
Call to the NGeDSboost
function.
A formula object representing the model to be fitted.
A list containing the arguments passed to the NGeDSboost
function. This includes:
response
data.frame
containing the response variable
observations.
predictors
data.frame
containing the observations
corresponding to the predictor variables included in the model.
base_learners
Description of the model's base learners.
family
The statistical family. The possible options are:
mboost::Binomial(type = c("adaboost", "glm")
,
link = c("logit", "probit", "cloglog", "cauchit", "log"), ...),
mboost::Gaussian()
,
mboost::Poisson()
and
mboost::GammaReg(nuirange = c(0, 100))
.
Other mboost
families may be suitable; however, these have not
yet been thoroughly tested and are therefore not recommended for use.
initial_learner
If TRUE
a NGeDS
or
GGeDS
fit was used as the initial learner; otherwise, the
empirical risk minimizer corresponding to the selected family
was
employed.
int.knots_init
If initial_learner = TRUE
, this
corresponds to the maximum number of internal knots set in the
NGeDS
/GGeDS
function for the initial
learner fit.
shrinkage
Shrinkage/step-length/learning rate utilized throughout the boosting iterations.
normalize_data
If TRUE
, then response and predictors
were standardized before running the FGB algorithm.
X_mean
Mean of the predictor variables (only if
normalize_data = TRUE
, otherwise this is NULL
).
X_sd
Standard deviation of the predictors (only if
normalize_data = TRUE
, otherwise this is NULL
).
Y_mean
Mean of the response variable (only if
normalize_data = TRUE
, otherwise this is NULL
).
Y_sd
Standard deviation of the response variable (only if
normalize_data = TRUE
, otherwise this is NULL
).
A list containing the model generated at each boosting iteration.
Each of these models
includes:
best_bl
Fit of the base learner that minimized the residual sum of squares (RSS) in fitting the gradient at the i-th boosting iteration.
Y_hat
Model fitted values at the i-th boosting iteration.
base_learners
Knots and polynomial coefficients for each of the base-learners at the i-th boosting iteration.
A list detailing the final GeDSboost model after the gradient descent algorithm is run:
model_name
The boosting iteration corresponding to the final model.
dev
Deviance of the final model.
Y_hat
Fitted values.
base_learners
A list containing, for each base-learner, the
intervals defined by the piecewise linear fit and its corresponding
polynomial coefficients. It also includes the knots corresponding to each
order fit, which result from computing the corresponding averaging knot
location. See Kaishev et al. (2016) for details. If the number of internal
knots of the final linear fit is less than n-1
, the averaging knot location
is not computed.
linear.fit
/quadratic.fit
/cubic.fit
Final
linear, quadratic and cubic fits in B-spline form. These include the
same elements as in a NGeDS
/GGeDS
object (see
SplineReg
for details).
A list containing the predicted values obtained for each of the fits (linear, quadratic and cubic).
A list detailing the internal knots obtained for each of the different order fits (linear, quadratic, and cubic).
Friedman, J.H. (2001).
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 29 (5), 1189–1232.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aos/1013203451")}
Bühlmann P., Yu B. (2003). Boosting With the L2 Loss. Journal of the American Statistical Association, 98(462), 324–339. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1198/016214503000125")}
Bühlmann P., Hothorn T. (2007).
Boosting Algorithms: Regularization, Prediction and Model Fitting.
Statistical Science, 22(4), 477 – 505.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/07-STS242")}
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00180-015-0621-7")}
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.amc.2022.127493")}
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
NGeDS
; GGeDS
; S3 methods such as
coef
, confint
,
deviance.GeDSboost
, family
, formula
,
knots
, logLik
,
predict
, print
,
summary
. Also variable importance measures
(bl_imp
) and sequential plotting facilities
(visualize_boosting
).
################################# Example 1 #################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.2)
data = data.frame(X, Y)
# Fit a Normal FGB-GeDS regression using NGeDSboost
Gmodboost <- NGeDSboost(Y ~ f(X), data = data)
MSE_Gmodboost_linear <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_cubic)^2)
cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
"Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
"Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
# Compute predictions on new randomly generated data
X <- sort(runif(100, min = -2, max = 2))
pred_linear <- predict(Gmodboost, newdata = data.frame(X), n = 2)
pred_quadratic <- predict(Gmodboost, newdata = data.frame(X), n = 3)
pred_cubic <- predict(Gmodboost, newdata = data.frame(X), n = 4)
MSE_Gmodboost_linear <- mean((sapply(X, f_1) - pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - pred_cubic)^2)
cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
"Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
"Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
## S3 methods for class 'GeDSboost'
# Print
print(Gmodboost); summary(Gmodboost)
# Knots
knots(Gmodboost, n = 2)
knots(Gmodboost, n = 3)
knots(Gmodboost, n = 4)
# Coefficients
coef(Gmodboost, n = 2)
coef(Gmodboost, n = 3)
coef(Gmodboost, n = 4)
# Wald-type confidence intervals
confint(Gmodboost, n = 2)
confint(Gmodboost, n = 3)
confint(Gmodboost, n = 4)
# Deviances
deviance(Gmodboost, n = 2)
deviance(Gmodboost, n = 3)
deviance(Gmodboost, n = 4)
# Plot
plot(Gmodboost, n = 3)
############################ Example 2 - Bodyfat ############################
library(TH.data)
data("bodyfat", package = "TH.data")
Gmodboost <- NGeDSboost(formula = DEXfat ~ age + f(hipcirc, waistcirc) + f(kneebreadth),
data = bodyfat, phi_boost_exit = 0.9, q_boost = 1, phi = 0.9, q = 1)
MSE_Gmodboost_linear <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_cubic)^2)
# Comparison
cat("\n", "MSE", "\n",
"Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
"Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
"Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.