gsp: General regression splines with variable degrees and ness,...
In gmonette/spida2: Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

gsp	R Documentation

General regression splines with variable degrees and ness, smoothing splines

Description

These functions implement a general polynomial spline with possibly different degrees in each interval and different orders of smoothness at each knot, including the possibility of allowing a discontinuity at a knot. The function sc helps in the construction of linear hypothesis matrices to estimate and test levels and derivatives of splines at arbitrary points and the saltus of derivatives that have discontinuities at knots.

Usage

gsp(
  x,
  knots,
  degree = 3,
  smoothness = pmax(pmin(degree[-1], degree[-length(degree)]) - 1, -1),
  lin = NULL,
  periodic = FALSE,
  intercept = 0,
  signif = 3
)

Arguments

`x`	value(s) where spline is evaluated
`knots`	vector of knots
`degree`	vector giving the degree of the spline in each interval. Note the number of intervals is equal to the number of knots + 1. A value of 0 corresponds to a constant in the interval. If the spline should evaluate to 0 in the interval, use the `intercept` argument to specify some value in the interval at which the spline must evaluate to 0.
`smoothness`	vector with the degree of smoothness at each knot (0 = continuity, 1 = smoothness with continuous first derivative, 2 = continuous second derivative, etc. The value -1 allows a discontinuity at the knot. A scalar is recycled so its length equals the number of knots. Alternatively, a list of length equal to the number of knots. Each element of the list is a vector of the orders of derivatives which are required to be smooth. THis allows non-sequential constraints, e.g. to have the same first and second derivative on either side of a knot but a possible discontinuity and change in higher-order derivatives, the vector would be c(1,2). Note that if a list is used, all elements must provide all desired constraints. That is the list argument corresponding to `smoothness = c(1,2,-1)` is `smoothness=list(0:1, 0:2, -1)`. The default is 1 less than the minimum of adjoining degrees at
`lin`	provides a matrix specifying additional linear contraints on the 'full' parametrization consisting of blocks of polynomials of degree equal to max(degree) in each of the length(knots)+1 intervals of the spline. See below for examples of a spline that is 0 outside of its boundary knots.
`periodic`	if TRUE generates a period spline on the base interval 0,max(knots). A constraint is generated so that the coefficients generate the same values to the right of max(knots) as they do to the right of 0. Note that all knots should be strictly positive.
`intercept`	value(s) of x at which the spline has value 0, i.e. the value(s) of x for which yhat is estimated by the intercept term in the model. The default is 0. If NULL, the spline is not constrained to evaluate to 0 for any x.
`signif`	number of significant digits used to label coefficients
`exclude`	number of leading columns to drop from spline matrix: 0: excludes the intercept column, 1: excludes the linear term as well. Terms that are excluded from the spline matrix can be modeled explicitly.
`a`	a 'periodic' knot at which a spline repeats itself
`n`	the maximal 'order', i.e. maximal degree + 1, of a periodic spline

Details

Many polynomial regression splines can be generated by 'plus' functions although the resulting basis for the spline may be ill conditioned. For example, a 'quadratic spline' (a spline that is quadratic in each interval with and smooth with a first derivative at each knot) with knots at 1 and 3 can be fitted with:

plus <- function(x,y) ifelse(x>0,y,0)
lm(y ~ x + I(x^2) + plus(x-1,(x-1)^2) + plus(x-3,(x-3)^2))

All 'standard' polynomial splines with the same degree in each interval and continuity of order one less than the degree at each knot can be constructed in this fashion. A convenient aspect of this parametrization of the spline is that the estimated coefficients have a simple interpretation. The coefficients for 'plus' terms represent the 'saltus' (jump) in the value of a coefficient at the knot. Testing whether the true value of the saltus in a coefficient is 0 is equivalent to a test for the need to retain the corresponding knot.

This approach does not work for some more complex splines with different degrees or different orders of continuity at the knots. An example is the commonly used natural quadratic spline. A natural quadratic spline with knots at -1,0 and 1 (where -1 and 1 are termed 'boundary knots') is linear in the intervals (-Inf,-1) and (1,+Inf), and quadratic in the intervals (-1,0) and (0,1). The spline is smooth of order 1 at each knot.

Many techniques for fitting splines generate a basis for the spline (columns of the design matrix) that has good numerical properties but whose coefficients do not have a simple interpretation.

The gsp function makes it easy to specify a spline with arbitrary degree in each interval and arbitrary smoothness at each knot. The parametrization produces coefficients that have a simple interpretation. For a spline of degree p at x = 0, coefficients correspond to the 1st, 2nd, ... pth derivative at 0. Additional coefficients correspond to each free saltus at each knot.

The sc function generates a matrix to estimate features of a fitted spline that can be expressed as linear combinations of the spline coefficients. Examples are various derivatives of the spline at any point, left or right derivatives of different orders and the saltus in derivatives at a knot.

A disadvantage of gsp is that the spline basis may be poorly conditioned. The impact of this problem can be mitigated by rescaling the x variable so that it has an appropriate range. For example, with a spline whose highest degree is cubic, ensuring that x has a range between -10 and 10 should avert numerical problems.

gsp generates a matrix of regressors for a spline with knots, degree of polynomials in each interval and the degree of smoothness at each knot. Typically, gsp is used to define a function that is then used in a model equation. See the examples below.

A function to fit a cubic spline with knots at 5 and 10 is generated with:

Overview of utility functions:

Xmat = function( x, degree, D = 0, signif = 3) design/estimation matrix for fD where f(x) is polynomial of degree degree.

Xf = function( x, knots, degree = 3, D = 0, right = TRUE , signif = 3) uses Xmat to form 'full' matrix with blocks determined by knots intervals

Cmat = function( knots, degree, smoothness, intercept = 0, signif = 3) linear constraints

Emat = function( knots, degree, smoothness , intercept = FALSE, signif = 3) estimates - not necessarily a basis

basis = function( X , coef = FALSE ) selects linear independent subset of columns of X

spline.T = function( knots, degree, smoothness, intercept = 0, signif = 3 ) full transformation of Xf to spline basis and constraints

spline.E = function( knots, degree, smoothness, intercept = 0, signif = 3 ) transformation for spline basis (first r columns of spline.T)

Value

gsp returns a matrix generating a spline. cs, qs and lsp return matrices generating cubic, quadratic and linear splines respectively.

smsp, whose code is adapted from function in the package lmeSplines, generates a smoothing spline to be used in the random effects portion of a call to lme.

Warning

The variables generated by gsp are designed so the coefficients are interpretable as changes in derivatives at knots. The resulting matrix is not designed to have optimal numerical properties.

The intermediate matrices generated by gsp will contain x raised to the power equal to the highest degree in degree. The values of x should be scaled to avoid excessively large values in the spline matrix and ensuing numerical problems.

Author(s)

Monette, G. georges@yorku.ca

References

%% ~put references to the literature/web site here ~

Examples

## Fitting a quadratic spline
simd <- data.frame( age = rep(1:50, 2), y = sin(2*pi*(1:100)/5) + rnorm(100),
          G = rep( c('male','female'), c(50,50)))
# define a function generating the spline
sp <- function(x) gsp( x, knots = c(10,25,40), degree = c(1,2,2,1),
          smoothness = c(1,1,1))

fit <- lm( y ~ sp(age)*G, simd)

require(lattice)
xyplot( predict(fit) ~ age , simd, groups = G,type = 'l')
summary(fit)

## Linear hypotheses
L <- list( "Overall test of slopes at 20" = rbind(
      "Female slope at age 20" =  c( F20 <- cbind( 0 , sc( sp, 20, D = 1), 0 , 0 * sc( sp, 20, D = 1))),
      "Male slope at age 20" =  c( M20 <- cbind( 0 , sc( sp, 20, D = 1), 0 , 1 * sc( sp, 20, D = 1))),
      "Difference" = c(M20 - F20))
      )
wald( fit, L)

## Right and left second derivatives at knots and saltus
L <- list( "Second derivatives and saltus for female curve at knot at 25" =
          cbind( 0, sc( sp, c(25,25,25), D = 2, type =c(0,1,2)), 0,0,0,0))
L
wald( fit, L )

# Quadratic natural spline
simd <- data.frame( age = rep(1:50, 2), 
                    y = sin(2*pi*(1:100)/5)+ rnorm(100), 
                    G = rep( c("male","female"), c(50,50)))

sp <- function(x) gsp( x, knots = c(10,25,40), 
          degree = c(1,2,2,1), 
          smoothness = c(1,1,1))

fit <- lm( y ~ sp(age)*G, simd) 
xyplot( predict(fit) ~ age , simd, groups = G, type = "l") 
summary(fit) # convenient display

# output:
#
# Call:
# lm(formula = y ~ sp(age) * G, data = simd)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.1111 -0.9429  0.0774  0.6633  3.6171 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)
# (Intercept)           0.522681   0.623071   0.839    0.404
# sp(age)D1(0)         -0.060361   0.056802  -1.063    0.291
# sp(age)C(10).2        0.009013   0.007116   1.267    0.208
# sp(age)C(25).2       -0.015459   0.013263  -1.166    0.247
# Gmale                -0.092681   0.881156  -0.105    0.916
# sp(age)D1(0):Gmale    0.042878   0.080331   0.534    0.595
# sp(age)C(10).2:Gmale -0.006703   0.010063  -0.666    0.507
# sp(age)C(25).2:Gmale  0.007650   0.018757   0.408    0.684
#
# Residual standard error: 1.261 on 92 degrees of freedom
# Multiple R-squared:  0.05212,	Adjusted R-squared:  -0.02 
# F-statistic: 0.7227 on 7 and 92 DF,  p-value: 0.653
# ---

L0 <- list( 
    "hat" = rbind( 
         "females at age=20" = c( 1, sc(sp,20), 0, 0 * sc(sp,20)), 
         "males at age=20"   = c( 1, sc(sp,20), 1, 1* sc(sp,20))),
    "male-female" = rbind( 
         "at 20" = c( 0 , 0*sc(sp,20), 1, 1*sc(sp,20)))) 
wald(fit, L0 )

L1 <- list(
    "D(yhat)/D(age)" = 
       rbind( "female at age = 25" = c(0, sc(sp,25,1), 0, 0*sc(sp,25,1)), 
    "male at x = 25" = c(0, sc(sp,25,1), 0, 1*sc(sp,25,1))))
wald( fit, L1) 

# A periodic spline function can be generated by forcing the coefficients
# beyond a periodic knot to repeat the pattern in a previous inteval. For
# example a periodic spline of period 1 can be created as follows:

x <- seq(-3,3,.01) 
y <- rnorm(x)

per.sp <- function(x) gsp( x %% 1, knots = 1, 
                  degree = c(3, 3), 
                  smoothness = 1,
                  lin = cbind( diag(4), - PolyShift(1,4))) 
fit <- lm( y ~ per.sp(x))
summary(fit) 
xyplot( y + predict(fit) ~ x , 
     panel = panel.superpose.2,
     type = c("p","l")) 
xyplot( predict(fit) ~ x , type = 'l')
#' # A periodic spline with additional knots can be created as shown below. Note
# that constraint matrix 'lin' expresses constraints for the 'full' polynomial
# parametrization, i.e. polynomials of degree 3 (thus of order 4 when
# including the constant term) in each of the 4 intervals.  The constraint
# given in 'lin' forces the coefficients beyond the periodic knot at x = 1, to
# repeat the polynomial in the interval just to the right of x = 0.

x <- seq(-3,3,.01) 
y <- rnorm(x)

per.sp <- function(x) gsp( x %% 1, knots = c(.333,.666,1), 
            degree = 3,
            smoothness = 1, 
            lin = cbind( diag(4),0*diag(4),0*diag(4), - PolyShift(1,4))) 
fit <- lm( y ~ per.sp(x)) 
summary(fit) 
xyplot( y + predict(fit) ~ x , 
   panel = panel.superpose.2,
   type = c("p","l")) 
xyplot( predict(fit) ~ x , type = 'l')

# Cubic spline:

sp <- function( x ) gsp( x, c(5,10), c(3,3,3), c(2,2))

# The parameters indicate that a cubic polynomial is used in each of the three intervals
# and that the second derivative is continuous at each knot.

# Natural cubic spline:
# is a cubic spline in bounded intervals with linear components 
# in each unbounded interval and continuous first derivative at the
# two knots for unbounded intervals.

sp <- function( x )  gsp( x, c(0,5,10,15), c(1,3,3,3,1), c(1,2,2,1)) 

# Quadratic and linear splines, respectively: 

sp.quad <- function( x ) gsp( x, c(5,10), c(2,2,2), c(1,1)) 
sp.lin  <- function( x ) gsp( x, c(5,10), c(1,1,1), c(0,0)) 
          
# When the same degree is used for all
# intervals and knots, it suffices to give it once: 

sp.quad <- function( x ) gsp( x, c(5,10), 2, 1) 
sp.lin  <- function( x ) gsp( x, c(5,10), 1, 0)  
     
# An easy way to specify a model in which a knot is dropped
# is to force a degree of continuity equal to the degree of adjoining
# polynomials, e.g.  to drop the knot at 10, use: 

sp.1 <- function( x ) gsp( x, c(5,10), c(3,3,3), c(2,3)) 
     
# This is sometimes easier than laboriously rewriting the 
# spline function for each null hypothesis.

# Depending on the maximal degree of the spline, the range of x should not be
# excessive to avoid numerical problems. The spline matrix generated is 'raw'
# and values of max(abs(x))^max(degree) may appear in the matrix.  For
# example, for a cubic spline, it might be desirable to rescale x and/or
# recenter x so abs(x) < 100 if that is not already the case. Note that the
# knots need to be correspondingly transformed.

# The naming of coefficients should allow them to be easily interpreted.  For
# example: 
 
zapsmall( gsp ( 0:10, c(3, 7) , c(2,3,2), c(1,1)))

## Output:
##
##       D1(0) D2(0) C(3).2   C(3).3 C(7).2
## f(0)      0   0.0    0.0  0.00000    0.0
## f(1)      1   0.5    0.0  0.00000    0.0
## f(2)      2   2.0    0.0  0.00000    0.0
## f(3)      3   4.5    0.0  0.00000    0.0
## f(4)      4   8.0    0.5  0.16667    0.0
## f(5)      5  12.5    2.0  1.33333    0.0
## f(6)      6  18.0    4.5  4.50000    0.0
## f(7)      7  24.5    8.0 10.66667    0.0
## f(8)      8  32.0   12.5 20.66667    0.5
## f(9)      9  40.5   18.0 34.66667    2.0
## f(10)    10  50.0   24.5 52.66667    4.5

# The coefficient for the first regressor is the first derivative at x = 0;
# for the second regressor, the second derivative at 0; the third, the saltus
# (change) in the second derivative at x = 3, the fourth, the saltus in the
# third derivative at x = 3 and, finally, the saltus in the second derivative
# at x = 7.

sp <- function(x) gsp ( x, c(3, 7) , c(2,3,2), c(1,1)) 
zd <- data.frame( x = seq(0,10, .5), 
                    y = seq(0,10,.5)^2 + rnorm(21)) 
fit <- lm( y ~ sp( x ), zd) 
summary(fit) 
Ls <-cbind( 0, sc( sp, c(1,2,3,3,3,5,7,7,7,8), 
              D = 3, type = c(0,0,0,1,2,0,0,1,2,0))) 
zapsmall(Ls )
##
## Output:
##                 D1(0) D2(0) C(3).2 C(3).3 C(7).2
## D3(1)         0     0     0      0      0      0
## D3(2)         0     0     0      0      0      0
## D3(3-)        0     0     0      0      0      0
## D3(3+)        0     0     0      0      1      0
## D3(3+)-D3(3-) 0     0     0      0      1      0
## D3(5)         0     0     0      0      1      0
## D3(7-)        0     0     0      0      1      0
## D3(7+)        0     0     0      0      0      0
## D3(7+)-D3(7-) 0     0     0      0     -1      0
## D3(8)         0     0     0      0      0      0

wald( fit, list( 'third derivatives' = Ls))

## Output:
##
##                   numDF denDF  F.value p.value
## third derivatives     1    15 2.184522 0.16009
##
##           ?     Estimate Std.Error  DF   t-value p-value Lower 0.95 Upper 0.95
## D3(1)          0.000000  0.000000  15  2.408694 0.02932   0.000000   0.000000
## D3(2)          0.000000  0.000000  15  2.408694 0.02932   0.000000   0.000000
## D3(3-)         0.000000  0.000000  15  2.408694 0.02932   0.000000   0.000000
## D3(3+)        -0.535959  0.362621  15 -1.478013 0.16009  -1.308867   0.236950
## D3(3+)-D3(3-) -0.535959  0.362621  15 -1.478013 0.16009  -1.308867   0.236950
## D3(5)         -0.535959  0.362621  15 -1.478013 0.16009  -1.308867   0.236950
## D3(7-)        -0.535959  0.362621  15 -1.478013 0.16009  -1.308867   0.236950
## D3(7+)         0.000000  0.000000 Inf       NaN     NaN   0.000000   0.000000
## D3(7+)-D3(7-)  0.535959  0.362621  15  1.478013 0.16009  -0.236950   1.308867
## D3(8)          0.000000  0.000000 Inf       NaN     NaN   0.000000   0.000000

# Note that some coefficients that are 0 by design may lead to invalid degrees
# of freedom and t-values.

# sc generates a portion of a hypothesis matrix for the coefficients of
# a general spline constructed with gsp With: 
# 
# sc( sp, x, D, type ):
# sp is the spline function for which coefficients 
#           are required, 
# x is the vector of values at which the spline is evaluated, 
# D is the order of derivatives: 0 = value of spline, 
#        1 = first derivative, etc.,  
# type at knots: 0 limit from the left, 1 from the
#       right, 2 is saltus (i.e. jump from left to right) 

# Warning: sc will not work correctly if the function defining the
# spline transforms the variable, e.g. sp <- function(x) gsp( x/100, knot=2 )

# Example:

simd <- data.frame( age = rep(1:50, 2), 
                    y = sin(2*pi*(1:100)/5) + rnorm(100), 
                    G = rep( c("male","female"), c(50,50)))

sp <- function(x) gsp( x, knots = c(10,25,40), 
          degree = c(1,2,2,1), 
          smoothness = c(1,1,1))

fit <- lm( y ~ sp(age)*G, simd) 
xyplot( predict(fit) ~ age , simd, groups = G,type = "l") 
summary(fit)

L0 <- list( 
    "hat" = rbind( 
         "females at age=20" = c( 1, sc(sp,20), 0, 0 * sc(sp,20)), 
         "males at age=20"   = c( 1, sc(sp,20), 1, 1* sc(sp,20))),
    "male-female" = rbind( 
         "at 20" = c( 0 , 0*sc(sp,20), 1, 1*sc(sp,20)))) 
wald(fit, L0 )

L1 <- list(
    "D(yhat)/D(age)" = 
       rbind( "female at age = 25" = c(0, sc(sp,25,1), 0, 0*sc(sp,25,1)), 
    "male at x = 25" = c(0, sc(sp,25,1), 0, 1*sc(sp,25,1))))
wald( fit, L1) 

# A periodic spline function can be generated by forcing the coefficients
# beyond a periodic knot to repeat the pattern in a previous inteval. For
# example a periodic spline of period 1 can be created as follows:

x <- seq(-3,3,.01) 
y <- rnorm(x)

per.sp <- function(x) gsp( x %% 1, 
                    knots = 1, 
                    degree = c(3, 3), 
                    smoothness = 1,
                    lin = cbind( diag(4), - PolyShift(1,4))) 
fit <- lm( y ~ per.sp(x))
summary(fit) 
xyplot( y + predict(fit) ~ x , 
        panel = panel.superpose.2,
        type = c("p","l")) 
xyplot( predict(fit) ~ x , type = 'l') 

# A periodic spline with additional knots can be created as shown below. Note
# that constraint matrix 'lin' expresses constraints for the 'full' polynomial
# parametrization, i.e. polynomials of degree 3 (thus of order 4 when
# including the constant term) in each of the 4 intervals.  The constraint
# given in 'lin' forces the coefficients beyond the periodic knot at x = 1, to
# repeat the polynomial in the interval just to the right of x = 0.

x <- seq(-3,3,.01) 
y <- rnorm(x)

per.sp <- function(x) gsp( x %% 1, knots = c(.333,.666,1), 
                degree = 3,
                smoothness = 1, 
                lin = cbind( diag(4),0*diag(4),0*diag(4), - PolyShift(1,4)))
fit <- lm( y ~ per.sp(x)) 
summary(fit) 
xyplot( y + predict(fit) ~ x , 
   panel = panel.superpose.2,type = c("p","l")) 
xyplot( predict(fit) ~ x , type = 'l')


## Smoothing splines

library(nlme)
library(MASS)
data(Spruce)
Spruce$all <- 1
range( Spruce$days)
sp <- function(x) smsp ( x, seq( 150, 675, 5))
spruce.fit1 <- lme(logSize ~ days, data=Spruce,
          random=list(all= pdIdent(~sp(days) -1),
          plot=~1, Tree=~1))
summary(spruce.fit1)
pred <- expand.grid( days = seq( 160, 670, 10), all = 1)
pred$logSize <- predict( spruce.fit1, newdata = pred, level = 1)
library( lattice )
xyplot( logSize ~ days, pred, type = 'l')

gmonette/spida2 documentation built on June 12, 2025, 9:44 p.m.