nominal: Nominal Smoothing Spline Basis and Penalty In npreg: Nonparametric Regression via Smoothing Splines

Description

Generate the smoothing spline basis and penalty matrix for a nominal spline. This basis and penalty are for an unordered factor.

Usage

 ```1 2 3``` ```basis.nom(x, knots, K = NULL, intercept = FALSE, ridge = FALSE) penalty.nom(x, K = NULL) ```

Arguments

 `x` Predictor variable (basis) or spline knots (penalty). Factor or integer vector of length n. `knots` Spline knots. Factor or integer vector of length r. `K` Number of levels of `x`. If `NULL`, this argument is defined as `K = length(unique(x))`. `intercept` If `TRUE`, the first column of the basis will be a column of ones. `ridge` If `TRUE`, the basis matrix is post-multiplied by the inverse square root of the penalty matrix. See Note and Examples.

Details

Generates a basis function or penalty matrix used to fit nominal smoothing splines.

With an intercept included, the basis function matrix has the form

X = [X_0, X_1]

where matrix `X_0` is an n by 1 matrix of ones, and `X_1` is a matrix of dimension n by r.

The `X_0` matrix contains the "parametric part" of the basis (i.e., the intercept). The matrix `X_1` contains the "nonparametric part" of the basis, which consists of the reproducing kernel function

ρ(x, y) = δ_{x y} - 1/K

evaluated at all combinations of `x` and `knots`. The notation δ_{x y} denotes Kronecker's delta function.

The penalty matrix consists of the reproducing kernel function

ρ(x, y) = δ_{x y} - 1/K

evaluated at all combinations of `x`.

Value

Basis: Matrix of dimension `c(length(x), df)` where `df = length(knots) + intercept`.

Penalty: Matrix of dimension `c(r, r)` where `r = length(x)` is the number of knots.

Note

If the inputs `x` and `knots` are factors, they should have the same levels.

If the inputs `x` and `knots` are integers, the `knots` should be a subset of the input `x`.

If `ridge = TRUE`, the penalty matrix is the identity matrix.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing Spline ANOVA Models. 2nd Ed. New York, NY: Springer-Verlag. doi: 10.1007/978-1-4614-5369-7

Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13. doi: 10.3389/fams.2017.00015

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036885885

Helwig, N. E., & Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24(3), 715-732. doi: 10.1080/10618600.2014.926819

See `ordinal` for a basis and penalty for ordered factors.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60``` ```######***###### standard parameterization ######***###### # generate data set.seed(0) n <- 101 x <- factor(sort(rep(LETTERS[1:4], length.out = n))) knots <- LETTERS[1:3] eta <- 1:4 y <- eta[x] + rnorm(n, sd = 0.5) # nominal smoothing spline basis X <- basis.nom(x, knots, intercept = TRUE) # nominal smoothing spline penalty Q <- penalty.nom(knots, K = 4) # pad Q with zeros (for intercept) Q <- rbind(0, cbind(0, Q)) # define smoothing parameter lambda <- 1e-5 # estimate coefficients coefs <- solve(crossprod(X) + n * lambda * Q) %*% crossprod(X, y) # estimate eta yhat <- X %*% coefs # check rmse sqrt(mean((eta[x] - yhat)^2)) ######***###### ridge parameterization ######***###### # generate data set.seed(0) n <- 101 x <- factor(sort(rep(LETTERS[1:4], length.out = n))) knots <- LETTERS[1:3] eta <- 1:4 y <- eta[x] + rnorm(n, sd = 0.5) # nominal smoothing spline basis X <- basis.nom(x, knots, intercept = TRUE, ridge = TRUE) # nominal smoothing spline penalty (ridge) Q <- diag(rep(c(0, 1), times = c(1, ncol(X) - 1))) # define smoothing parameter lambda <- 1e-5 # estimate coefficients coefs <- solve(crossprod(X) + n * lambda * Q) %*% crossprod(X, y) # estimate eta yhat <- X %*% coefs # check rmse sqrt(mean((eta[x] - yhat)^2)) ```