Nothing
################################################################################
################################################################################
################################### SplineReg ##################################
################################################################################
################################################################################
#' @title Estimation for Models with Spline and Parametric Components
#' @name SplineReg
#' @description
#' Functions that estimate the coefficients of a predictor model involving a
#' spline component and possibly a parametric component applying (Iteratively
#' Re-weighted) Least Squares, (IR)LS, estimation.
#' @param X A numeric vector containing \eqn{N} sample values of the covariate
#' chosen to enter the spline regression component of the predictor model.
#' @param Y A vector of size \eqn{N} containing the observed values of the
#' response variable \eqn{y}.
#' @param Z A design matrix with \eqn{N} rows containing other covariates
#' selected to enter the parametric component of the predictor model (see
#' \code{\link[=formula.GeDS]{formula}}). If no such covariates are selected,
#' it is set to \code{NULL} by default.
#' @param offset A vector of size \eqn{N} that can be used to specify a fixed
#' covariate to be included in the predictor model avoiding the estimation of
#' its corresponding regression coefficient. In case more than one covariate is
#' fixed, the user should sum the corresponding coordinates of the fixed
#' covariates to produce one common \eqn{N}-vector of coordinates. The argument
#' \code{offset} is particularly useful in \code{Splinereg_GLM} if the link
#' function used is not the identity.
#' @param weights An optional vector of "prior weights" to be put on the
#' observations in the fitting process in case the user requires weighted
#' fitting. It is a vector of 1s by default.
#' @param InterKnots A numeric vector containing the locations of the internal
#' knots necessary to compute the B-splines. In GeDS these are the internal
#' knots fitted in stage A's iterations.
#' @param n Integer value specifying the order of the spline to be evaluated.
#' It should be 2 (linear spline), 3 (quadratic spline) or 4 (cubic spline).
#' Non-integer values will be passed to the function \code{\link{as.integer}}.
#' @param extr Optional numeric vector of 2 elements representing the left-most
#' and right-most limits of the interval embedding the sample values of
#' \code{X}. By default equal to the range of \code{X}.
#' @param family A description of the error distribution and link function to be
#' used in the model. This can be a character string naming a family function, a
#' family function or the result of a call to a family function. See
#' \code{\link[stats]{family}} for details of family functions.
#' @param inits A numeric vector of length
#' \code{length(InterKnots) + n + NCOL(Z)} providing initial values for the
#' coefficients, to be used in the IRLS estimation (alternative to providing the
#' \code{mustart} vector).
#' @param mustart Initial values for the vector of means in the IRLS estimation.
#' Must be a vector of length \eqn{N}.
#' @param etastart Initial values for the predictor in the IRLS estimation
#' (alternative to providing either \code{inits} or \code{mustart}). Must be a
#' vector of length \eqn{N}.
#' @param prob The confidence level to be used for the confidence bands in the
#' \code{SplineReg_LM} fit. See Details below.
#' @param coefficients Optional vector of spline coefficients. If provided,
#' \code{SplineReg} computes only the corresponding predicted values.
#' @param only_pred Logical, if \code{TRUE} only \code{theta},
#' \code{predicted}, \code{residuals} and \code{rss} will be computed.
#'
#' @details
#' The functions estimate the coefficients of a predictor model with a spline
#' component (and possibly a parametric component) for a given order,
#' vector of knots of the spline and distribution of the response
#' variable (from the exponential family). The functions \code{SplineReg_LM} and
#' \code{SplineReg_GLM} are based on LS and IRLS, respectively, and are used in
#' \code{\link{NGeDS}} and \code{\link{GGeDS}} to estimate the coefficients of
#' the final GeDS fits in stage B. These fits use knots positioned at the
#' Greville abscissae of the linear fit from stage A (see Dimitrova et al. 2023).
#' Additional inference related quantities are also computed (see Value below).
#' The function \code{SplineReg_GLM} is also used to estimate the coefficients of
#' the linear GeDS fit of stage A within \code{\link{GGeDS}}, whereas in
#' \code{\link{NGeDS}} this estimation is performed internally leading to faster
#' \R code.
#'
#' In addition \code{SplineReg_LM} computes some useful quantities among which
#' confidence intervals and the control polygon (see Section 2 of
#' Kaishev et al. 2016).
#'
#' The confidence intervals contained in the output slot \code{nci} are
#' approximate local bands obtained considering the knots as fixed constants.
#' Hence the columns of the design matrix are seen as covariates and standard
#' methodology relying on the \code{se.fit} option of \code{predict.lm} or
#' \code{predict.glm} is used. In the \code{aci} slot, asymptotic confidence
#' intervals are provided, following Kaishev et al (2006). If the variance
#' matrix is singular the Moore-Penrose pseudo-inverse is computed instead.
#'
#' As mentioned, \code{SplineReg_GLM} is intensively used in Stage A of the GeDS
#' algorithm implemented in \code{\link{GGeDS}} and in order to make it as fast
#' as possible input data validation is mild. Hence it is expected that the user
#' checks carefully the input parameters before using \code{SplineReg_GLM}. The
#' \code{"residuals"} in the output of this function are similar to the so
#' called "working residuals" in the \code{\link[stats]{glm}} function.
#' \code{"residuals"} are the residuals \eqn{r_i} used in the knot placement
#' procedure, i.e. \deqn{r_i= (y_i - \hat{\mu}_i){d \mu_i \over d \eta_i },} but
#' in contrast to \code{\link[stats]{glm}} "working residuals", they are
#' computed using the final IRLS fitted \eqn{\hat{\mu}_i}. \code{"residuals"}
#' are then used in locating the knots of the linear spline fit of Stage A.
#'
#' In \code{SplineReg_GLM} confidence intervals are not computed.
#'
#' @return A \code{list} containing:
#' \describe{
#' \item{theta}{A vector containing the fitted coefficients of the spline
#' regression component and the parametric component of the predictor model.}
#' \item{predicted}{A vector of \eqn{N} predicted mean values of the response
#' variable computed at the sample values of the covariate(s).}
#' \item{residuals}{A vector containing the normal regression residuals if
#' \code{SplineReg_LM} is called or the residuals described in Details if
#' \code{SplineReg_GLM} is called.}
#' \item{rss}{The deviance for the fitted predictor model, defined as in
#' Dimitrova et al. (2023), which for \code{SplineReg_LM} coincides with the
#' residual sum of squares.}
#' \item{basis}{The matrix of B-spline regression functions and the covariates
#' of the parametric part evaluated at the sample values of the covariate(s).}
#' \item{nci}{A list containing the lower (\code{Low}) and upper (\code{Upp})
#' limits of the approximate confidence intervals computed at the sample values
#' of the covariate(s). See Details above.}
#' \item{aci}{A list containing the lower (\code{Low}) and upper (\code{Upp})
#' limits of the asymptotic confidence intervals computed at the sample values
#' of the covariate(s).}
#' \item{polygon}{A list containing x-y coordinates (\code{"kn"} and
#' \code{"thetas"}) of the vertices of the control polygon, see
#' Dimitrova et al. (2023).}
#' \item{temporary}{The result of the function \code{\link[stats]{lm}} if
#' \code{SplineReg_LM} is used, or the output of the function
#' \code{\link[stats]{glm}} if \code{SplineReg_GLM} is used.}
#' }
#'
#' @references
#' Kaishev, V. K., Dimitrova, D. S., Haberman, S. & Verrall, R. J. (2006).
#' Geometrically designed, variable know regression splines: asymptotics and
#' inference \emph{(Statistical Research Paper No. 28)}.
#' London, UK: Faculty of Actuarial Science & Insurance, city University London. \cr
#' URL: \href{https://openaccess.city.ac.uk/id/eprint/2372}{openaccess.city.ac.uk}
#'
#' Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016).
#' Geometrically designed, variable knot regression splines.
#' \emph{Computational Statistics}, \strong{31}, 1079--1105. \cr
#' DOI: \doi{10.1007/s00180-015-0621-7}
#'
#' Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
#' Geometrically designed variable knot splines in generalized (non-)linear
#' models. \emph{Applied Mathematics and Computation}, \strong{436}. \cr
#' DOI: \doi{10.1016/j.amc.2022.127493}
#'
#' @seealso \code{\link{NGeDS}}, \code{\link{GGeDS}}, \code{\link{Fitters}},
#' \code{\link{IRLSfit}}, \code{\link[stats]{lm}} and
#' \code{\link[stats]{glm.fit}}.
#'
#' @rdname SplineReg
#' @aliases SplineReg
#' @importFrom splines splineDesign
#' @importFrom stats lm coef
#' @export
SplineReg_LM <- function(X, Y, Z = NULL, offset = rep(0,length(X)), weights = rep(1,length(X)),
InterKnots, n, extr = range(X), prob = 0.95,
coefficients = NULL, only_pred = FALSE)
{
# Convert spline order to integer
n <- as.integer(n)
# Create spline basis matrix using specified knots, evaluation points and order
basisMatrix <- splineDesign(knots = sort(c(InterKnots,rep(extr,n))), x = X, ord = n,
derivs = rep(0,length(X)), outer.ok = T)
# Combine spline basis with parametric design matrix (if provided)
basisMatrix2 <- cbind(basisMatrix, Z)
# 1) If coefficients are NOT provided estimate the corresponding regression model
if (is.null(coefficients)) {
# Substract offset (if any) from Y
Y0 <- Y - offset
# Fit linear model without intercept, using weights
tmp <- lm(Y0 ~ -1 + basisMatrix2, weights = as.numeric(weights))
# Extract fitted coefficients
theta <- coef(tmp)
# Compute predicted values
predicted <- basisMatrix2 %*% theta + offset
# 2) If coefficients are provided, use them to compute predicted values directly
} else {
tmp <- NULL
theta <- coefficients
predicted <- basisMatrix2 %*% theta + offset
}
# Calculate residuals
resid <- Y - predicted
if (!only_pred) {
# Knots for control polygon
nodes <- sort(c(InterKnots,rep(extr,n)))[-c(1, NCOL(basisMatrix)+1)]
polyknots <- makenewknots(nodes, degree = n)
# Confidence intervals
ci <- ci(tmp, resid, prob = 0.95, basisMatrix, basisMatrix2, predicted,
n_obs = length(Y), type = "lm", huang = TRUE)
nci <- ci$nci; aci <- ci$aci
} else {
polyknots <- nci <- aci <- NULL
}
out <- list("theta" = theta, "predicted" = predicted, "residuals" = resid,
"rss" = as.numeric(crossprod(resid)), "basis" = basisMatrix,
"nci" = nci, "aci" = aci,
"polygon" = list("kn" = polyknots,
"thetas" = theta[1:NCOL(basisMatrix)]),
"temporary" = tmp)
return(out)
}
#' @importFrom splines splineDesign
#' @importFrom stats glm coef
#' @export
#' @rdname SplineReg
SplineReg_GLM <- function(X, Y, Z, offset = rep(0,nobs), weights = rep(1,length(X)),
InterKnots, n, extr = range(X), family, mustart,
inits = NULL, etastart = NULL)
{
# Ensure X, Y, Z, InterKnots are numeric, and n and extr are integers
X <- as.numeric(X); Y <- as.numeric(Y); Z <- as.numeric(Z)
InterKnots <- as.numeric(InterKnots); n <- as.integer(n); extr <- as.numeric(extr)
# Check that 'n' (spline order) has length 1
if(length(n) != 1) stop("'n' must have length 1")
ord <- n # to avoid problem in use of family$initialize e.g. binomial()
# Check X and Y have the same length
if(length(X) != length(Y)) stop("length of 'X' must match length of 'Y'")
# Ensure weights are set
if(is.null(weights)) weights <- rep(1,length(Y))
# Set required environment variables for family$initialize and IRLSfit
y <- Y; nobs <- NROW(Y)
# Create spline basis matrix using specified knots, order, and evaluation points
basisMatrix <- splineDesign(knots = sort(c(InterKnots,rep(extr,n))), x = X, ord = n,
derivs = rep(0,length(X)), outer.ok = T)
# Combine spline basis with parametric design matrix (if provided)
basisMatrix2 <- cbind(basisMatrix,Z)
# Initialize mustart based on input or defaults
if (missing(mustart) || is.null(mustart)) {
if (is.null(inits)) {
# Set environment to parent frame
env <- parent.frame()
eval(family$initialize)
mustart <- env$mustart
} else {
# Validate length of 'inits'
if (length(inits)!= NCOL(basisMatrix2)) stop("'inits' must be of length length(InterKnots) + n + NCOL(Z)")
# Calculate initial mustart based on 'inits' (initial value for spline coefficients)
mustart <- family$linkinv(basisMatrix2 %*% inits)
}
}
Y0 <- Y - offset
# Fit linear model without intercept, using weights
# tmp <- IRLSfit(basisMatrix2, Y0, offset = offset,
# family = family, mustart = mustart, weights = weights)
# tmp <- glm.fit(basisMatrix2, Y0, family = family,
# weights = weights, mustart = mustart)
tmp <- glm(Y0 ~ -1 + basisMatrix2, family = family, weights = weights, mustart = mustart)
# Extract fitted coefficients
theta <- coef(tmp)
# Compute predicted mean values of the response variable
predicted <- family$linkinv(basisMatrix2 %*% theta + offset)
# Knots for control polygon
nodes <- sort(c(InterKnots,rep(extr,ord)))[-c(1,NCOL(basisMatrix)+1)]
polyknots <- makenewknots(nodes, degree = ord)
# Extract residuals
resid <- tmp$residuals
ci <- ci(tmp, resid, prob = 0.95, basisMatrix, basisMatrix2, predicted,
n_obs = length(Y), type = "glm", huang = FALSE)
out <- list("theta" = theta, "predicted" = predicted, "residuals" = resid,
"rss" = tmp$deviance, "basis" = basisMatrix, "nci" = ci$nci,
"polygon" = list("kn" = polyknots,
"thetas" = theta[1:NCOL(basisMatrix)]),
"temporary" = tmp)
return(out)
}
# this is an unexported function.
# it may be useful in case knots are seleced according to some other algorithms (e.g. package earth?)
# and then one want to use GeDS to get higher order spline fits
#' @importFrom splines splineDesign
#' @importFrom stats glm.fit coef
SplineReg_GLM2 <- function(X,Y,Z,offset=rep(0,nobs),
weights=rep(1,length(X)),InterKnots,n,extr=range(X),family,
inits = NULL,mustart){
X <- as.numeric(X)
Y <- as.numeric(Y)
#Z <- as.numeric(Z)
nobs <- NROW(Y)
y <- Y
if(length(X) != length(Y)) stop("length of 'X' must match length of 'Y'")
InterKnots <- as.numeric(InterKnots)
if(length(n) != 1) stop("'n' must have length 1")
n <- as.integer(n)
extr <- as.numeric(extr)
basisMatrix <- splineDesign(knots=sort(c(InterKnots,rep(extr,n))),
derivs=rep(0,length(X)),x=X,ord=n,outer.ok = T)
basisMatrix2 <- cbind(basisMatrix,Z)
ord <- n #may cause problem the use of family$initialize e.g. binomial()
if(missing(mustart)||is.null(mustart)){
env <- parent.frame()
if(is.null(weights)) weights <- rep(1,length(y))
if (is.null(inits)) {
eval(family$initialize)
mustart <- env$mustart
} else {
if(length(inits)!= NCOL(basisMatrix2)) stop("'inits' must be of length length(InterKnots) + n + NCOL(Z)")
mustart <- family$linkinv(basisMatrix2 %*% inits)
}
}
tmp <- glm.fit(basisMatrix2, Y,
family=family)
theta <- coef(tmp)
nodes <-sort(c(InterKnots,rep(extr,ord)))[-c(1,NCOL(basisMatrix)+1)]
polyknots <- makenewknots(nodes,degree=ord)
predicted <- basisMatrix2 %*% theta + offset
resid <- tmp$res2
out <- list("theta" = theta,"predicted" = predicted,
"residuals" = resid, "rss" = tmp$lastdeviance,
"basis"= basisMatrix,
"polygon" = list("kn" = polyknots, "thetas" = theta[1:NCOL(basisMatrix)]),
"temporary" = tmp, "deviance" = tmp$deviance)
return(out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.