GGeDS: Generalized Geometrically Designed Spline Regression...
In GeDS: Geometrically Designed Spline Regression

GGeDS

R Documentation

Generalized Geometrically Designed Spline Regression Estimation

Description

GGeDS constructs a geometrically designed (univariate or bivariate) variable knots spline regression model for the predictor in the context of generalized (non-)linear models. This is referred to as a GeDS model for a response with a distribution from the exponential family.

Usage

GGeDS(
  formula,
  family = gaussian(),
  data,
  weights,
  beta,
  phi = 0.99,
  min.intknots,
  max.intknots,
  q = 2L,
  Xextr = NULL,
  Yextr = NULL,
  show.iters = FALSE,
  stoptype = "SR",
  higher_order = TRUE
)

Arguments

`formula`	A description of the structure of the predictor model to be fitted, including the dependent and independent variables. See `formula` for details.
`family`	A description of the error distribution and link function to be used in the model. This can be a character string naming a family function (e.g., `"gaussian"`), the family function itself (e.g. `gaussian`) or the result of a call to a family function (e.g., `gaussian()`). See family for details on family functions.
`data`	An optional `data.frame`, `list` or `environment` containing the variables of the predictor model. If the formula variables are not found in `data`, they are taken from `environment(formula)`, typically the environment from which `GGeDS` is called.
`weights`	An optional vector of "prior weights" to be put on the observations during the fitting process in case the user requires weighted GeDS fitting. It is `NULL` by default.
`beta`	Numeric parameter in the interval `[0,1]` tuning the knot placement in stage A of GeDS. See Details below.
`phi`	Numeric parameter in the interval `(0,1)` specifying the threshold for the stopping rule (model selector) in stage A of GeDS. See also `stoptype` and Details below.
`min.intknots`	Optional parameter allowing the user to set a minimum number of internal knots to be fit in stage A. By default equal to zero.
`max.intknots`	Optional parameter allowing the user to set a maximum number of internal knots to be added by stage A's GeDS estimation algorithm. By default equal to the number of knots `\kappa` for the saturated GeDS model (i.e., `\kappa=N-2`, where `N` is the number of observations).
`q`	Numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to `2L`. See Details below.
`Xextr`	Numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the independent variable. See Details.
`Yextr`	Numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the second independent variable (if bivariate GeDS is run). See Details.
`show.iters`	Logical variable indicating whether or not to print information of the fit at each GeDS iteration. By default equal to `FALSE`.
`stoptype`	A character string indicating the type of GeDS stopping rule to be used. It should be either one of `"SR"`, `"RD"` or `"LR"`, partial match allowed. See Details below.
`higher_order`	Logical; if `TRUE`, the function proceeds to Stage B, fitting higher-order models (quadratic and cubic), after completing Stage A. Default is `TRUE`.

Details

The GGeDS function extends the GeDS methodology, developed by Kaishev et al. (2016) and implemented in the NGeDS function for the normal case, to the more general GNM (GLM) context, allowing for the response to have any distribution from the exponential family. Under the GeDS-GNM approach the (non-)linear predictor is viewed as a spline with variable knots that are estimated along with the regression coefficients and the order of the spline, using a two stage procedure. In stage A, a linear variable-knot spline is fitted to the data applying iteratively re-weighted least squares (see IRLSfit function). In stage B, a Schoenberg variation diminishing spline approximation to the fit from stage A is constructed, thus simultaneously producing spline fits of order 2, 3, and 4, all of which are included in the output (an object of class "GeDS"). A detailed description of the underlying algorithm can be found in Dimitrova et al. (2023).

As noted in formula, the argument formula allows the user to specify predictor models with two components: a spline regression (non-parametric) component involving part of the independent variables identified through the function f, and an optional parametric component involving the remaining independent variables. For GGeDS only one or two independent variables are allowed for the spline component and arbitrary many independent variables for the parametric component of the predictor. Failure to specify the independent variable for the spline regression component through the function f will return an error. See formula.

Within the argument formula, similarly as in other R functions, it is possible to specify one or more offset variables, i.e. known terms with fixed regression coefficients equal to 1. These terms should be identified via the function offset.

The parameter beta tunes the placement of a new knot in stage A of the algorithm. At the beginning of each GeDS iteration, a second-order spline is fitted to the data. As follows, the "working residuals" (see IRLSfit) are computed and grouped by their sign. A new knot is then placed within the cluster that maximizes a certain measure. This measure is defined as a weighted linear combination of the range of the independent variable at the cluster and the mean of the absolute residuals within it. The parameter beta determines the weights in this measure correspondingly: beta and 1 - beta. The higher beta is, the more weight is put to the mean of the residuals and the less to the range of their corresponding x-values (see Kaishev et al., 2016, for further details).

The default values of beta are beta = 0.5 if the response is assumed to be Gaussian, beta = 0.2 if it is Poisson (or Quasipoisson), while if it is Binomial, Quasibinomial or Gamma beta = 0.1, which reflect our experience of running GeDS for different underlying functional dependencies.

The argument stoptype allows to choose between three alternative stopping rules for the knot selection in stage A of GeDS: "RD", that stands for Ratio of Deviances; "SR", that stands for Smoothed Ratio of deviances; and "LR", that stands for Likelihood Ratio. The latter is based on the difference of deviances rather than on their ratio as in the case of "RD" and "SR". Therefore "LR" can be viewed as a log likelihood ratio test performed at each iteration of the knot placement. In each of these cases the corresponding stopping criterion is compared with a threshold value phi (see below).

The argument phi provides a threshold value required for the stopping rule to exit the knot placement in stage A of GeDS. The higher the value of phi, the more knots are added under the "RD" and "SR" stopping rules, contrary to the case of the stopping rule "LR" where the lower phi is, more knots are included in the spline regression. Further details for each of the three alternative stopping rules can be found in Dimitrova et al. (2023).

The argument q is an input parameter that fine-tunes the stopping rule in stage A. It specifies the number of consecutive iterations over which the deviance must exhibit stable convergence to terminate knot placement in stage A. Specifically, under any of the rules "RD", "SR" or "LR" the deviance at the current iteration is compared to the deviance computed q iterations before, i.e. before introducing the last q knots.

Value

An object of class "GeDS" (a named list) with similar components described under NGeDS’s @return plus the following slots:

type: Character string indicating the type of regression performed. This can be "GLM - Univ"/"GLM - Biv", respectively corresponding to generalized (GNM-GLM) univariate/bivariate GeDS (implemented by GGeDS).
guesses: Initial values for the estimation of the spline coefficients at each iteration of stage A. Since the initial values are used only in the IRLS procedure, this slot applies only to objects created by GGeDS, GenUnivariateFitter or GenBivariateFitter functions.
iterIrls: Vector containing the numbers of IRLS iterations for all iterations of stage A cumulatively. Since the IRLS procedure is used only in GGeDS, GenUnivariateFitter or GenBivariateFitter, this slot is empty if the object is not created by one of these functions.
extcall: call to the GGeDS function.

References

Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079–1105.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00180-015-0621-7")}

Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023). Geometrically designed variable knot splines in generalized (non-)linear models. Applied Mathematics and Computation, 436.
DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.amc.2022.127493")}

Examples

######################################################################
# Generate a data sample for the response variable Y and the covariate X
# assuming Poisson distributed error and log link function
# See section 4.1 in Dimitrova et al. (2023)
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- exp(f_1(X))

#############
## POISSON ##
#############
# Generate Poisson distributed Y according to the mean model
Y <- rpois(N, means)

# Fit a Poisson GeDS regression using GGeDS
(Gmod <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.99, q = 2, family = poisson(),
               Xextr = c(-2,2)))
# Plot the quadratic and cubic GeDS fits
plot(X, log(Y), xlab = "x", ylab = expression(f[1](x)))
lines(X, sapply(X, f_1), lwd = 2)
lines(Gmod, n = 3, col = "red")
lines(Gmod, n = 4, col = "blue", lty = 2)
legend("topleft",
       legend = expression(f[1](x), "Quadratic", "Cubic"),
       col = c("black", "red", "blue"),
       lty = c(1, 1, 2),
       lwd = c(2, 1, 1),
       bty = "n")

# Generate GeDS prediction at X=0, first on the response scale and then on
# the predictor scale
predict(Gmod, n = 3, newdata = data.frame(X = 0))
predict(Gmod, n = 3, newdata = data.frame(X = 0), type = "link")

# Apply some of the other available methods, e.g.
# knots, coefficients and deviance extractions for the
# quadratic GeDS fit
knots(Gmod)
coef(Gmod)
deviance(Gmod)

# the same but for the cubic GeDS fit
knots(Gmod, n = 4)
coef(Gmod, n = 4)
deviance(Gmod, n = 4)

###########
## GAMMA ##
###########
# Generate Gamma distributed Y according to the mean model
Y <- rgamma(N, shape = means, rate = 0.1)
# Fit a Gamma GeDS regression using GGeDS
Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, q = 2, family = Gamma(log),
              Xextr = c(-2,2))
plot(Gmod, f = function(x) exp(f_1(x))/0.1)

##############
## BINOMIAL ##
##############
# Generate Binomial distributed Y according to the mean model
eta <- f_1(X) - 4
means <- exp(eta)/(1+exp(eta))
Y <- rbinom(N, size = 50, prob = means) / 50
# Fit a Binomial GeDS regression using GGeDS
Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, family = "quasibinomial",
              Xextr = c(-2,2))
plot(Gmod, f = function(x) exp(f_1(x) - 4)/(1 + exp(f_1(x) - 4)))


##########################################
# A real data example
# See Dimitrova et al. (2023), Section 4.2

data("coalMining")
(Gmod2 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.98,
                 family = poisson(), data = coalMining))
(Gmod3 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.985,
                 family = poisson(), data = coalMining))
plot(coalMining$years, coalMining$accidents, type = "h", xlab = "Years",
     ylab = "Accidents")
lines(Gmod2, tr = exp, n = 4, col = "red")
lines(Gmod3, tr = exp, n = 4, col = "blue", lty = 2)
legend("topright", c("phi = 0.98","phi = 0.985"), col = c("red", "blue"),
       lty=c(1, 2))


## Not run: 
##########################################
# The same regression in the example of GeDS
# but assuming Gamma and Poisson responses
# See Dimitrova et al. (2023), Section 4.2

data('BaFe2As2')
(Gmod4 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.995, q = 3,
                family = Gamma(log), stoptype = "RD"))
plot(Gmod4)

(Gmod5 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.1, phi = 0.995, q = 3,
                family = poisson(), stoptype = "SR"))
plot(Gmod5)

## End(Not run)

##########################################
# Life tables
# See Dimitrova et al. (2023), Section 4.2

data(EWmortality)
attach(EWmortality)
(M1 <- GGeDS(formula = Deaths ~ f(Age) + offset(log(Exposure)),
              family = quasipoisson(), phi = 0.99, beta = 0.1, q = 3,
              stoptype = "LR"))

Exposure_init <- Exposure + 0.5 * Deaths
Rate <- Deaths / Exposure_init
(M2 <- GGeDS(formula = Rate ~ f(Age), weights = Exposure_init,
              family = quasibinomial(), phi = 0.99, beta = 0.1,
              q = 3, stoptype = "LR"))


op <- par(mfrow=c(2,2))
plot(Age, Deaths/Exposure, ylab = expression(mu[x]), xlab = "Age")
lines(M1, n = 3, tr = exp, lwd = 1, col = "red")
plot(Age, Rate, ylab = expression(q[x]), xlab = "Age")
lines(M2, n = 3, tr = quasibinomial()$linkinv, lwd = 1, col = "red")
plot(Age, log(Deaths/Exposure), ylab = expression(log(mu[x])), xlab = "Age")
lines(M1, n = 3, lwd = 1, col = "red")
plot(Age, quasibinomial()$linkfun(Rate), ylab = expression(logit(q[x])), xlab = "Age")
lines(M2, n = 3, lwd = 1, col = "red")
par(op)

#########################################
# bivariate example
set.seed(123)
doublesin <- function(x) {
# Adjusting the output to ensure it's positive
exp(sin(2*x[,1]) + sin(2*x[,2]))
}
X <- round(runif(400, min = 0, max = 3), 2)
Y <- round(runif(400, min = 0, max = 3), 2)
# Calculate lambda for Poisson distribution
lambda <- doublesin(cbind(X,Y))
# Generate Z from Poisson distribution
Z <- rpois(400, lambda)
data <- data.frame(X, Y, Z)

# Fit a Poisson GeDS regression using GGeDS
BivGeDS <- GGeDS(Z ~ f(X,Y), beta = 0.2, phi = 0.99, family = "poisson")

# Poisson mean deviance w.r.t data
deviance(BivGeDS, n = 2) # or sum(poisson()$dev.resids(Z, BivGeDS$Linear.Fit$Predicted, wt = 1))
deviance(BivGeDS, n = 3)
deviance(BivGeDS, n = 4)

# Poisson mean deviance w.r.t true function 
f_XY <- apply(cbind(X, Y), 1, function(row) doublesin(matrix(row, ncol = 2)))
mean(poisson()$dev.resids(f_XY, BivGeDS$Linear.Fit$Predicted, wt = 1))
mean(poisson()$dev.resids(f_XY, BivGeDS$Quadratic.Fit$Predicted, wt = 1))
mean(poisson()$dev.resids(f_XY, BivGeDS$Cubic.Fit$Predicted, wt = 1))

# Surface plot of the generating function (doublesin)
plot(BivGeDS, f = doublesin)
# Surface plot of the fitted model
plot(BivGeDS)

GeDS documentation built on June 30, 2025, 9:07 a.m.