gps.smooth: General P-splines for package 'mgcv'

gps.smoothR Documentation

General P-splines for package ‘mgcv’

Description

Construct and predict a general P-spline in the framework of generalized additive models (GAMs) using mgcv.

Usage

## S3 method for class 'gps.smooth.spec'
smooth.construct(object, data, knots)

## S3 method for class 'gps.smooth'
Predict.matrix(object, data)

Arguments

object

a smooth term specification list generated by s(x, bs = "gps", ...).

data

a named list containing the data (including any "by" variable) required by this term, whose names should match object$term (and object$by).

knots

a named list containing any interior knots supplied for constructing B-splines, whose names should match object$term. Can be NULL for automatic knot placement.

Details

Simple Specification

A general P-spline f(x) with p order-d B-splines and an order-m general difference matrix is specified as s(x, bs = 'gps', k = p, m = c(d - 1, m)) in a formula, fed to mgcv's model fitting functions, namely gam (generalized additive models), bam (generalized additive models for large datasets) and gamm (generalized additive mixed models). A simpler specification as s(x, bs = 'gps', k = p) assumes d = 4 and m = 2, i.e., a cubic spline with a 2nd order general difference penalty.

Note that the math notations p, d, k and m should not be confused with function arguments k and m. The latter are set by mgcv's standard. Also, argument m expects B-splines to be specified by its degree d - 1, not its order d.

Multiple Penalties

Usually there is only one wiggliness penalty. However, we can set up multiple penalties (each with an independent smoothing parameter). For example, a general P-spline with 20 cubic B-splines and three difference penalties of order 3 to 1 is specified as s(x, bs = 'gps', k = 20, m = c(3, 3, 2, 1)). This is as same as saying that its 1st to 3rd derivatives are penalized.

Additional Arguments

By mgcv's standard, s() has an argument xt that accepts a list of extra arguments. You can use this argument to gain more control over the construction of a "gps" smooth.

  1. By default, the domain of the spline is chosen to be [min(x), max(x)]. You can specify a wider interval [a, b] by adding xt = list(domain = [a, b]) into s();

  2. To use derivative penalty instead of general difference penalty, i.e., to specify an O-spline, append derivative = TRUE to the xt list;

  3. To construct a general P-spline with periodic boundary constraints, i.e., f^{(q)}(a) = f^{(q)}(b), q = 0, 1, ..., degree - 1, append periodic = TRUE to the xt list.

So for example, an O-spline with 20 cubic B-splines on domain [-5, 5] and a 1st order penalty is specified as s(x, bs = 'gps', k = 20, m = c(3, 1), xt = list(domain = c(-5, 5), derivative = TRUE)).

Knot Placement

By default, K = p + d quantile knots with clamped boundary knots are placed on [a, b]. To override such automatic knot placement, add knots = list(x = xs) into gam(), bam() or gamm() (not into s()!), where xs is an ascending sequence of k = p - d interior knots in (a, b).

Tensor Product Smooth

A "gps" smooth can be a margin of a tensor product smooth, constructed with te(), ti() or t2(). For example, a bivariate tensor product spline f(x_1, x_2) = f_1(x_1) \otimes f_2(x_2), where f_1(x_1) is a "gps" with 10 cubic B-splines and a 1st order difference penalty, and f_2(x_2) is a "gps" with 20 cubic B-splines and a 2nd order difference penalty, subject to periodic boundary constraints, can be specified as te(x1, x2, d = c(1, 1), bs = c('gps', 'gps'), k = c(10, 20), m = list(x1 = c(3, 1), x2 = c(3, 2)), xt = list(x1 = NULL, x2 = list(periodic = TRUE))).

For te() and ti(), a "gps" margin often results in a warning about "unstable reparametrization". To get rid of such warning, append np = FALSE to te() or ti(). The "t2" smooth does not suffer from such issue.

Note that it is not possible to use multiple penalties for a margin.

"By" Variable

Both s() and tensor product smooth (te(), ti() or t2()) have a by argument, that can accept a numerical variable or a factor variable.

Factor-Smooth Interaction (efficient for gamm)

A "gps" smooth can also be used in an "fs" smooth. For example, let fac be a factor variable, then s(x, fac, bs = 'fs', k = 10, xt = list(bs = 'gps'), m = c(3, 1)) sets up a general P-spline with 10 cubic B-splines and a 1st order difference penalty for each level of fac, which is as same as specifying a random-effect trajectory for each level. All trajectories share a common smoothing parameter. With a 1st order penalty, this model term will be shrunk toward a random intercept as smoothing parameters goes to infinity. If a 2nd order penalty is used, a random slope will also be included, which is often not desirable.

Note that xt argument in this case is processed by "fs" smooth, not "gps" smooth. At the moment "fs" smooth ignores other variables provided through the xt list, so that it is not possible to successfully pass in additional arguments for the "gps" smooth.

Also note that multiple penalties are not allowed by a "fs" smooth.

Value

An object of class "gps.smooth". It augments the smooth term specification list, by adding for example, B-spline design matrix, list of penalty matrices, and information like domain and knots. See examples below.

Note

Mathematically speaking, general P-spline comprises standard P-spline as a special case. However, a "gps" smooth always constructs B-splines with clamped boundary knots, it is thus impossible to produce a standard P-spline with "gps" smooth class (even if all interior knots are equidistant). To specify a standard P-spline, you should use the "ps" class from mgcv.

Author(s)

Zheyuan Li zheyuan.li@bath.edu

References

Li, Z. and Cao, J. (2022). General P-Splines for Non-Uniform B-Splines.

Examples

require(mgcv)
require(gps.mgcv)

###################################################################
## use general P-spline for smoothing (in the framework of GAMs) ##
###################################################################

## set this to TRUE for a periodic spline demo
cyclic <- FALSE

## 250 data from a random cubic spline on [0, 1]
spl <- RandomSpl(n = 250, periodic = cyclic, plot = FALSE)
## noisy (x, y) observations
x <- spl$x
y <- rnorm(250, spl$y, sd = 0.2 * sd(spl$y))
## fit a general P-spline using 10 cubic B-splines and a 2nd order penalty
## increase k if the fit is not adequate (it can happen for some random example)
fit <- gam(y ~ s(x, bs = "gps", k = 10, xt = list(periodic = cyclic)))
## visualize fit
plot(x, y, col = 8)
lines(spl$xg, spl$yg, col = 2, lwd = 2)
yh <- predict(fit, newdata = data.frame(x = spl$xg))
lines(spl$xg, yh, lwd = 2)

####################################################
## direct use of smooth.construct.gps.smooth.spec ##
####################################################

## this is useful when you want to use the constructor outside mgcv for your own purpose

## provide data in a list (or data frame)
xdat <- list(x = rnorm(50))

## simplest specification, 10 cubic B-splines with a 2nd order penalty
sterm <- s(x, bs = "gps", k = 10)
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = NULL)

## use argument 'm' to specify, say 10 quadratic B-splines with a 1st order penalty
sterm <- s(x, bs = "gps", k = 10, m = c(2, 1))
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = NULL)

## multiple penalties, say 10 cubic B-splines with penalties of order 1 to 3
sterm <- s(x, bs = "gps", k = 10, m = c(3, 1, 2, 3))
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = NULL)

## use argument 'xt' to specify a wider domain than [min(x), max(x)]
ab <- extendrange(xdat$x, f = 0.05)
sterm <- s(x, bs = "gps", k = 10, xt = list(domain = ab))
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = NULL)

## use argument 'xt' to specify an O-spline
sterm <- s(x, bs = "gps", k = 10, xt = list(derivative = TRUE))
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = NULL)

## use argument 'xt' to specify a general P-spline with periodic boundary constraints
sterm <- s(x, bs = "gps", k = 10, xt = list(periodic = TRUE))
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = NULL)

## specify your own knots
## a reminder: provide interior knots!!
## interior knots are strictly ascending **inside** the range of x-values
## number of interior knots = argument.k - degree - 1
## the following code constructs 10 cubic B-splines on equidistant interior knots
## (note that this is not a standard P-spline because of clamped boundary knots)
sterm <- s(x, bs = "gps", k = 10)
xk <- seq.int(min(xdat$x), max(xdat$x), length.out = 8)[-c(1, 8)]
gps.object <- smooth.construct.gps.smooth.spec(sterm, data = xdat, knots = list(x = xk))

## in all examples above, you can inspect the construction result using str()
str(gps.object)
## but you are probably more interested in the following quantities
## B-spline design matrix
gps.object$X
## list of penalty matrices
gps.object$S
## spline domain
gps.object$xt$domain
## interior knots
gps.object$interior.knots
## full knot sequence (boundary knots are always clamped)
gps.object$knots
## O-spline instead of general P-spline?
gps.object$xt$derivative
## with periodic boundary constraints?
gps.object$xt$periodic
## note that in the periodic case, number of B-splines = argument.k - degree
## that is, for cubic splines, with k = 10 you see 7 not 10 columns in X
## the reason is that the 3 constraints are absorbed into X
## similarly, penalty matrix is 7 x 7 not 10 x 10

ZheyuanLi/gps.mgcv documentation built on Sept. 19, 2024, 9:11 p.m.