gspline: General parametric regression splines with variable degrees...

View source: R/gspline.R

gsplineR Documentation

General parametric regression splines with variable degrees and smoothness

Description

This function creates functions (closures) that implement general polynomial splines with possibly different degrees in each interval and different orders of smoothness at each knot, including the possibility of allowing a discontinuity at a knot.

For transitional reasons with the forthcoming gspline package, Gspline is a synonym for gspline.

Usage

gspline(
  knots,
  degree = 3,
  smoothness = pmax(pmin(degree[-1], degree[-length(degree)]) - 1, 0),
  intercept = 0,
  constraints = NULL,
  estimates = NULL,
  periodic = FALSE,
  tolerance = 1e-14,
  debug = FALSE
)

## S3 method for class 'gspline'
print(
  x,
  show = c("knots", "degree", "smoothness", "G", "constraint_mat", "estimate_mat")
)

## S3 method for class 'gspline_matrix'
print(x, ...)

## S3 method for class 'gspline'
knots(sp)

Gspline(
  knots,
  degree = 3,
  smoothness = pmax(pmin(degree[-1], degree[-length(degree)]) - 1, 0),
  intercept = 0,
  constraints = NULL,
  estimates = NULL,
  periodic = FALSE,
  tolerance = 1e-14,
  debug = FALSE
)

Arguments

knots

vector of knots

degree

vector giving the degree of the spline in each interval. Note the number of intervals is equal to the number of knots + 1. A value of 0 corresponds to a constant in the interval.

smoothness

vector with the degree of smoothness at each knot (0 = continuity, 1 = smooth with continuous first derivative, 2 = continuous second derivative, etc. The value -1 allows a discontinuity at the knot. Alternative, a list each of whose elements specifies the derivatives that must be continuous at each knot. This allow the set of continuous derivatives to skip a value. For example, to specify that a function can have a discontinuity but must have the same limiting slope and second derivative (curvature) on either side of the discontinuity, the smoothness vector for the corresponding knot would have the value c(1,2), thus leaving the value (denoted by 0) unconstrained.

constraints

provides a vector or matrix specifying additional linear contraints on the 'full' parametrization consisting of blocks of polynomials of degree equal to max(degree) in each of the length(knots)+1 intervals of the spline. See below for examples of a spline that is 0 outside of its boundary knots.

estimates

provides a vector or matrix specifying additional linear combination of the parameters in the 'full' parametrization that should be estimated by estimated coefficients of the spline.

periodic

if TRUE a periodic spline is generated on the base interval 0,max(knots). A constraint is generated so that the coefficients generate, in effect, the same fitted values and derivatives to the right of max(knots) as they do to the right of 0. Note that all knots must be strictly positive. Since the knot at 0 corresponds to the knot at 'max(knots)', the knot at 0 must not be explicitly specified. Note also that the degree in the interval to the right of 'max(knots)' must be equal to that in the interval to the left of 'min(knots)'. If these are specified to be equal, gspline uses the maximum for both intervalsl. The continuity contraints associated with the last knot are, in effect, continuity constraints at 0.

x

(parameter of the closure created by gspline) value(s) where the spline is evaluated. This is the only parameter used when generating portions of a model matrix.

signif

number of significant digits used to label coefficients

exclude

(not yet implemented) number of leading columns to drop from spline matrix: 0: excludes the intercept column, 1: excludes the linear term as well. Terms that are excluded from the spline matrix can be modeled explicitly.

a

(not yet implemented) a 'periodic' knot at which a spline repeats itself

n

(not yet implemented) the maximal 'order', i.e. maximal degree + 1, of a periodic spline

D

(parameter of the closure created by gspline) value(s) the order of derivatives to be estimated when creating a linear hypothesis matrix. The default is D=0 corresponding to the value of the spline.

limit

(parameter of the closure created by gspline) specifies whether a derivative or value should be estimated as as limit from the left limit = -1, from the right, limit = -1, or the difference of the two limits, limit = -0. The default is limit = -1.

Details

The functions created by gspline can also be used to generate portions of a linear hypothesis matrix that can be used with a Wald test to estimate features of the spline such as derivatives of various orders, discontinuities, etc.

Many polynomial regression splines can be generated by 'plus' functions although the resulting basis for the spline may be ill conditioned. For example, a 'quadratic spline' (a spline that is quadratic in each interval with and smooth with a first derivative at each knot) with knots at 1 and 3 can be fitted with:

plus <- function(x,y) ifelse(x>0,y,0)
lm(y ~ x + I(x^2) + plus(x-1,(x-1)^2) + plus(x-3,(x-3)^2))

All 'standard' polynomial splines with the same degree in each interval and continuity of order one less than the degree at each knot can be constructed in this fashion. A convenient aspect of this parametrization of the spline is that the estimated coefficients have a simple interpretation. The coefficients for 'plus' terms represent the 'saltus' (jump) in the value of a coefficient at the knot. Testing whether the true value of the saltus in a coefficient is 0 is equivalent to a test for the need to retain the corresponding knot.

This approach does not work for some more complex splines with different degrees or different orders of continuity at the knots.

Many techniques for fitting splines generate a basis for the spline (columns of the design matrix) that has good numerical properties but whose coefficients do not have a simple interpretation.

The gspline function generates functions representing splines with arbitrary degrees in each interval and arbitrary smoothness at each knot. Any set of orders of derivatives can be contrained to be continuous at any particular knot. The parametrization produces coefficients that have a simple interpretation. For a spline of degree p at x = 0, coefficients correspond to the 1st, 2nd, ... pth derivative at 0. Additional coefficients correspond to each free discontinuities at each knot.

A disadvantage of spline functions generated by gspline is that the spline basis may be poorly conditioned. The impact of this problem can be mitigated by rescaling the x variable so that it has an appropriate range. For example, with a spline whose highest degree is cubic, ensuring that x has a range between -10 and 10 should avert numerical problems.

Let sp denote a spline matrix created by gspline. For example sp <- gspline(knots = c(-1,0,2), degree = c(2,3,3,2), smoothness = c(1,-1,1)) creates a spline function sp that can be used in linear formulas to fit a polynomial spline with knots at -1, 0 and 2, with degrees 2, 3, 3, and 2 in the four intervals bounded by the knots and with "C 1" continuity (continuous first derivative) at the knots -1 and 2, and a possible discontinuity in value at the knot 0.

The linear formula to fit this spline has the form y ~ 1 + sp(x) which can be augmented with other regressors in the usual way.

Called with a single argument (e.g. sp(x)) sp returns a matrix of regressors and can be used in linear model formulas, e.g y ~ sp(x), y ~ A * sp(x) or y ~ A / sp(x) -1 if A is a factor to fit separate splines within each level of A.

sp can be called with additional argument to geneate portions of linear hypothesis matrices that can be used test or estimate various values or derivatives of the spline. For example, sp(c(-1,-1,-1,0,2), D = c(2, 2, 2, 0, 1), limit = c(-1, 1, 0, 0, 1)) will return a matrix of coefficients to estimate, respectively, the limits from the left and from the right of the second derivative at -1, the difference (jump or saltus) in this derivative at -1, the size of the discontinuity in the value of the function at 0, and the first derivative at 1, which being continuous there has the same value whether a limit is taken from the right or from the left.

A function to fit a cubic spline with knots at 5 and 10 is generated with sp <- gspline(knots = c(5, 10), degree = c(3, 3, 3), smoothness = c(2, 2)), or more briefly: sp <- gspline(c(5, 10), 3, 2) since the degree and smoothness arguments are recycled as required by the number of knots.

A natural cubic spline with knots at 5 and 10 and boundary knots at 0 and 20 can be created with sp <- gspline(c(0, 5, 10, 20), c(1, 3, 3, 3, 1), c(2, 2, 2, 2)).

A step function with steps at 0, 1 and 2 would have the form: sp <- gspline(c(0, 1, 2), 0, -1).

Value

gspline returns a closure that creates portions of model matrices or of linear hypothesis matrices for Wald test for a general polynomial spline.

Warning

The estimands generated by spline functions created by gspline are designed so the coefficients are interpretable as changes in derivatives at knots. The resulting matrix is not designed to have optimal numerical properties.

Examples

## Fitting a quadratic spline
simd <- data.frame( age = rep(1:50, 2), y = sin(2*pi*(1:100)/5) + rnorm(100),
          G = rep( c('male','female'), c(50,50)))
# define a function generating the spline quadratic spline with linear extrapolation
sp <- gspline(knots = c(10,25,40), degree = c(1,2,2,1),
          smooth = c(1,1,1))

fit <- lm( y ~ sp(age)*G, simd)

require(lattice)
xyplot( predict(fit) ~ age, simd, groups = G, type = 'l')
summary(fit)

fitg <- update(fit, y ~ G/sp(age) - 1)
summary(fitg)

## Linear hypotheses
L <- list("Overall test of slopes at 20" = rbind(
      "Female slope at age 20" =  c( F20 <- cbind( 0 , sp(20, D = 1), 0 , 0 * sp(20, D = 1))),
      "Male slope at age 20" =  c( M20 <- cbind( 0 , sp(20, D = 1), 0 , 1 * sp(20, D = 1))),
      "Difference" = c(M20 - F20))
      )
wald(fit, L)

## Right and left second derivatives at knots and saltus (jump)

L <- list("Second derivatives and saltus for female curve at knot at 25" =
          cbind( 0, sp(c(25,25,25), D = 2, limit =c(-1,1,0)), 0,0,0,0))
L
wald(fit, L)
 
##                                                            
## Second derivatives and saltus for female curve at knot at 25
##   numDF denDF  F.value p.value
##       2    92 2.289052 0.10711
##                Estimate Std.Error DF   t-value p-value Lower 0.95 Upper 0.95
## D2|25-         0.004942  0.006258 92  0.789720 0.43172  -0.007487   0.017372
## D2|25+        -0.010879  0.006011 92 -1.809958 0.07357  -0.022817   0.001059
## D2|25+-D2|25- -0.015822  0.011665 92 -1.356298 0.17832  -0.038990   0.007347

L0 <- list( 
    "hat" = rbind( 
         "females at age=20" = c( 1, sp(20), 0, 0 * sp(20)), 
         "males at age=20"   = c( 1, sp(20), 1, 1* sp(20))),
    "male-female" = rbind( 
         "at 20" = c( 0 , 0*sp(20), 1, 1*sp(20)))) 
wald(fit, L0)

L1 <- list(
    "D(yhat)/D(age)" = 
       rbind( "female at age = 25" = c(0, sp(25,1), 0, 0*sp(25,1)), 
    "male at x = 25" = c(0, sp(25,1), 0, 1*sp(25,1))))
wald( fit, L1) 

# Cubic spline:

sp <- gspline(c(5,10), c(3,3,3), c(2,2))

# The parameters indicate that a cubic polynomial is used in each of the three intervals
# and that the second derivative is continuous at each knot.

# Cubic natural spline:
# is a cubic spline in bounded intervals with linear components 
# in each unbounded interval and continuous first derivative at the
# two knots for unbounded intervals.

sp.natural <- gspline(c(0,5,10,15), c(1,3,3,3,1), c(2,2,2,2)) 

# Quadratic and linear splines and step functions, respectively: 

sp.quad <- gspline(c(5,10), c(2,2,2), c(1,1)) 
sp.lin  <- gspline(c(5,10), c(1,1,1), c(0,0)) 
sp.step  <- gspline(c(5,10), c(0,0,0), c(-1,-1)) 
          
# When the same degree is used for all
# intervals and knots, it suffices to give it once: 

sp.quad <- gspline(c(5,10), 2, 1) 
sp.lin  <- gspline(c(5,10), 1, 0) 
sp.step  <- gspline(c(5,10), 0, -1 
     
# An easy way to specify a model in which a knot is dropped
# is to force a degree of continuity equal to the degree of adjoining
# polynomials, e.g.  to drop the knot at 10, use: 

sp.1 <- gspline(c(5,10), c(3,3,3), c(2,3)) 
     
# This is sometimes easier than laboriously rewriting the 
# spline function for each null hypothesis.

# Depending on the maximal degree of the spline, the range of x should not be
# excessive to avoid numerical problems. The spline matrix generated is 'raw'
# and values of max(abs(x))^max(degree) may appear in the matrix.  For
# example, for a cubic spline, it might be desirable to rescale x and/or
# recenter x so abs(x) < 100 if that is not already the case. Note that the
# knots need to be correspondingly transformed.

# The naming of coefficients should allow them to be easily interpreted.  For
# example: 

sp <- gspline(c(3,7), c(2,3,2), c(1,1)) 
sp(0:10, c(1,1))

## Output:
##       D1|0 D2|0 C2|3 C3|3 C2|7
## D1|0     1    0    0  0.0    0
## D1|1     1    1    0  0.0    0
## D1|2     1    2    0  0.0    0
## D1|3     1    3    0  0.0    0
## D1|4     1    4    1  0.5    0
## D1|5     1    5    2  2.0    0
## D1|6     1    6    3  4.5    0
## D1|7     1    7    4  8.0    0
## D1|8     1    8    5 12.0    1
## D1|9     1    9    6 16.0    2
## D1|10    1   10    7 20.0    3

# The coefficient for the first regressor is the first derivative at x = 0;
# for the second regressor, the second derivative at 0; the third, the saltus
# (change) in the second derivative at x = 3, the fourth, the saltus in the
# third derivative at x = 3 and, finally, the saltus in the second derivative
# at x = 7.

sp <- gspline (c(3, 7) , c(2,3,2), c(1,2)) 
zd <- data.frame( x = seq(0,10, .5), 
                    y = seq(0,10,.5)^2 + rnorm(21)) 
fit <- lm( y ~ sp( x ), zd) 
summary(fit) 
Ls <- cbind(0, sp(c(1,2,3,3,3,5,7,7,7,8), 
              D = 2, limit =c(-1,-1,-1,1,0,-1,-1,1,0,-1))) 
zapsmall(Ls )
#
# Note that estimates of features that are continuous at a point
# are indicated without a direction for the limit, e.g. "D2(1)"
# is the second derivative at 1. 
##
## Output:
##
##               D1|0 D2|0 C2|3 C3|3
## D2|1        0    0    1    0    0
## D2|2        0    0    1    0    0
## D2|3-       0    0    1    0    0
## D2|3+       0    0    1    1    0
## D2|3+-D2|3- 0    0    0    1    0
## D2|5        0    0    1    1    2
## D2|7        0    0    1    1    4
## D2|7        0    0    1    1    4
## D2|7+-D2|7- 0    0    0    0    0
## D2|8        0    0    1    1    4

wald( fit, list( 'second derivatives' = Ls))

# Note that some coefficients that are 0 by design may lead to invalid degrees
# of freedom and t-values.


gmonette/spida2 documentation built on Aug. 20, 2023, 7:21 p.m.