# polynomial: Polynomial Smoothing Spline Basis and Penalty In npreg: Nonparametric Regression via Smoothing Splines

## Description

Generate the smoothing spline basis and penalty matrix for a polynomial spline. Derivatives of the smoothing spline basis matrix are supported.

## Usage

 1 2 3 4 5 6 basis.poly(x, knots, m = 2, d = 0, xmin = min(x), xmax = max(x), periodic = FALSE, rescale = FALSE, intercept = FALSE, bernoulli = TRUE, ridge = FALSE) penalty.poly(x, m = 2, xmin = min(x), xmax = max(x), periodic = FALSE, rescale = FALSE, bernoulli = TRUE) 

## Arguments

 x Predictor variable (basis) or spline knots (penalty). Numeric or integer vector of length n. knots Spline knots. Numeric or integer vector of length r. m Penalty order. "m=1" for linear smoothing spline, "m=2" for cubic, and "m=3" for quintic. d Derivative order. "d=0" for smoothing spline basis, "d=1" for 1st derivative of basis, and "d=2" for 2nd derivative of basis. xmin Minimum value of "x". xmax Maximum value of "x". periodic If TRUE, the smoothing spline basis is periodic w.r.t. the interval [xmin, xmax]. rescale If TRUE, the nonparametric part of the basis is divided by the average of the reproducing kernel function evaluated at the knots. intercept If TRUE, the first column of the basis will be a column of ones. bernoulli If TRUE, scaled Bernoulli polynomials are used for the basis and penalty functions. ridge If TRUE, the basis matrix is post-multiplied by the inverse square root of the penalty matrix. See Note and Examples.

## Details

Generates a basis function or penalty matrix used to fit linear, cubic, and quintic smoothing splines (or evaluate their derivatives).

For non-periodic smoothing splines, the basis function matrix has the form

X = [X_0, X_1]

where the matrix X_0 is of dimension n by m-1 (plus 1 if an intercept is included), and X_1 is a matrix of dimension n by r.

The X_0 matrix contains the "parametric part" of the basis, which includes polynomial functions of x up to degree m-1.

The matrix X_1 contains the "nonparametric part" of the basis, which consists of the reproducing kernel function

ρ(x, y) = κ_m(x) κ_m(y) + (-1)^{m-1} κ_{2m}(|x-y|)

evaluated at all combinations of x and knots. The κ_v functions are scaled Bernoulli polynomials.

For periodic smoothing splines, the X_0 matrix only contains the intercept column and the modified reproducing kernel function

ρ(x, y) = (-1)^{m-1} κ_{2m}(|x-y|)

is evaluated for all combinations of x and knots.

For non-periodic smoothing splines, the penalty matrix consists of the reproducing kernel function

ρ(x, y) = κ_m(x) κ_m(y) + (-1)^{m-1} κ_{2m}(|x-y|)

evaluated at all combinations of x. For periodic smoothing splines, the modified reproducing kernel function

ρ(x, y) = (-1)^{m-1} κ_{2m}(|x-y|)

is evaluated for all combinations of x.

If bernoulli = FALSE, the reproducing kernel function is defined as

ρ(x, y) = (1/(m-1)!)^2 \int_0^1 (x - u)_+^{m-1} (y - u)_+^{m-1} du

where (.)_+ = \max(., 0). This produces the "classic" definition of a smoothing spline, where the function estimate is a piecewise polynomial function with pieces of degree 2m - 1.

## Value

Basis: Matrix of dimension c(length(x), df) where df >= length(knots). If the smoothing spline basis is not periodic (default), then the number of columns is df = length(knots) + m - !intercept. For periodic smoothing splines, the basis has m fewer columns.

Penalty: Matrix of dimension c(r, r) where r = length(x) is the number of knots.

## Note

Inputs x and knots should be within the interval [xmin, xmax].

If ridge = TRUE, the penalty matrix is the identity matrix.

## Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

## References

Gu, C. (2013). Smoothing Spline ANOVA Models. 2nd Ed. New York, NY: Springer-Verlag. doi: 10.1007/978-1-4614-5369-7

Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13. doi: 10.3389/fams.2017.00015

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036885885

Helwig, N. E., & Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24(3), 715-732. doi: 10.1080/10618600.2014.926819

See thinplate for a thin-plate spline basis and penalty.
See ordinal for a basis and penalty for ordered factors.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 ######***###### standard parameterization ######***###### # generate data set.seed(0) n <- 101 x <- seq(0, 1, length.out = n) knots <- seq(0, 0.95, by = 0.05) eta <- 1 + 2 * x + sin(2 * pi * x) y <- eta + rnorm(n, sd = 0.5) # cubic smoothing spline basis X <- basis.poly(x, knots, intercept = TRUE) # cubic smoothing spline penalty Q <- penalty.poly(knots, xmin = min(x), xmax = max(x)) # pad Q with zeros (for intercept and linear effect) Q <- rbind(0, 0, cbind(0, 0, Q)) # define smoothing parameter lambda <- 1e-5 # estimate coefficients coefs <- solve(crossprod(X) + n * lambda * Q) %*% crossprod(X, y) # estimate eta yhat <- X %*% coefs # check rmse sqrt(mean((eta - yhat)^2)) # plot results plot(x, y) lines(x, yhat) ######***###### ridge parameterization ######***###### # generate data set.seed(0) n <- 101 x <- seq(0, 1, length.out = n) knots <- seq(0, 0.95, by = 0.05) eta <- 1 + 2 * x + sin(2 * pi * x) y <- eta + rnorm(n, sd = 0.5) # cubic smoothing spline basis X <- basis.poly(x, knots, intercept = TRUE, ridge = TRUE) # cubic smoothing spline penalty (ridge) Q <- diag(rep(c(0, 1), times = c(2, ncol(X) - 2))) # define smoothing parameter lambda <- 1e-5 # estimate coefficients coefs <- solve(crossprod(X) + n * lambda * Q) %*% crossprod(X, y) # estimate eta yhat <- X %*% coefs # check rmse sqrt(mean((eta - yhat)^2)) # plot results plot(x, y) lines(x, yhat)