# ss: Fit a Smoothing Spline In npreg: Nonparametric Regression via Smoothing Splines

## Description

Fits a smoothing spline with the smoothing parameter selected via one of eight methods: GCV, OCV, GACV, ACV, REML, ML, AIC, or BIC.

## Usage

 ```1 2 3 4 5``` ```ss(x, y = NULL, w = NULL, df, spar = NULL, lambda = NULL, method = c("GCV", "OCV", "GACV", "ACV", "REML", "ML", "AIC", "BIC"), m = 2L, periodic = FALSE, all.knots = FALSE, nknots = .nknots.smspl, knots = NULL, keep.data = TRUE, df.offset = 0, penalty = 1, control.spar = list(), tol = 1e-6 * IQR(x), bernoulli = TRUE) ```

## Arguments

 `x` Predictor vector of length n. Can also input a list or a two-column matrix specifying x and y. `y` Response vector of length n. If y is missing or NULL, the responses are assumed to be specified by x, with x the index vector. `w` Weights vector of length n. Defaults to all 1. `df` Equivalent degrees of freedom (trace of the smoother matrix). Must be in [m,nx], where nx is the number of unique x values, see below. `spar` Smoothing parameter. Typically (but not always) in the range (0,1]. If specified `lambda = 256^(3*(spar-1))`. `lambda` Computational smoothing parameter. This value is weighted by n to form the penalty coefficient (see Details). Ignored if `spar` is provided. `method` Method for selecting the smoothing parameter. Ignored if `spar` or `lambda` is provided. `m` Penalty order (integer). The penalty functional is the integrated squared m-th derivative of the function. Defaults to m = 2, which is a cubic smoothing spline. Set m = 1 for a linear smoothing spline or m = 3 for a quintic smoothing spline. `periodic` Logical. If `TRUE`, the estimated function f(x) is constrained to be periodic, i.e., f(a) = f(b) where a = \min(x) and b = \max(x). `all.knots` If `TRUE`, all distinct points in x are used as knots. If `FALSE` (default), a sequence knots is placed at the quantiles of the unique x values; in this case, the input `nknots` specifies the number of knots in the sequence. Ignored if the knot values are input using the `knots` argument. `nknots` Positive integer or function specifying the number of knots. Ignored if either `all.knots = TRUE` or the knot values are input using the `knots` argument. `knots` Vector of knot values for the spline. Should be unique and within the range of the x values (to avoid a warning). `keep.data` Logical. If `TRUE`, the original data as a part of the output object. `df.offset` Allows the degrees of freedom to be increased by `df.offset` in the GCV criterion. `penalty` The coefficient of the penalty for degrees of freedom in the GCV criterion. `control.spar` Optional list with named components controlling the root finding when the smoothing parameter spar is computed, i.e., missing or NULL, see below. Note that spar is only searched for in the interval [lower, upper]. lower: lower bound for spar; defaults to 0. upper: upper bound for spar; defaults to 1. tol: the absolute precision (tolerance) used by `optimize`; defaults to 1e-8. `tol` Tolerance for same-ness or uniqueness of the x values. The values are binned into bins of size tol and values which fall into the same bin are regarded as the same. Must be strictly positive (and finite). `bernoulli` If `TRUE`, scaled Bernoulli polynomials are used for the basis and penalty functions. If `FALSE`, produces the "classic" definition of a smoothing spline, where the function estimate is a piecewise polynomial function with pieces of degree 2m - 1. See `polynomial` for details.

## Details

Inspired by the `smooth.spline` function in R's stats package.

Neither `x` nor `y` are allowed to containing missing or infinite values.

The `x` vector should contain at least 2m distinct values. 'Distinct' here is controlled by `tol`: values which are regarded as the same are replaced by the first of their values and the corresponding `y` and `w` are pooled accordingly.

Unless `lambda` has been specified instead of `spar`, the computational λ used (as a function of `spar`) is λ = 256^(3*(spar - 1)).

If `spar` and `lambda` are missing or `NULL`, the value of `df` is used to determine the degree of smoothing. If `df` is missing as well, the specified `method` is used to determine λ.

Letting f_i = f(x_i), the function is represented as

f = X β + Z γ

where the basis functions in X span the null space (i.e., functions with m-th derivative of zero), and Z contains the reproducing kernel function of the contrast space evaluated at all combinations of observed data points and knots, i.e., Z[i,j] = ρ(x_i, k_j) where ρ is the kernel function and k_j is the j-th knot. The vectors β and γ contain unknown basis function coefficients. Letting M = (X, Z) and θ = (β', γ')', the penalized least squares problem has the form

(y - M θ)' W (y - M θ) + n λ γ' Q γ

where W is a diagonal matrix containg the weights, and Q is the penalty matrix. Note that Q[i,j] = ρ(k_i, k_j) contains the reproducing kernel function evaluated at all combinations of knots. The optimal coefficients are the solution to

(M' W M + n λ P) θ = M' W y

where P is the penalty matrix Q augmented with zeros corresponding to the β in θ.

## Value

An object of class "ss" with components:

 `x` the distinct `x` values in increasing order; see Note. `y` the fitted values corresponding to `x`. `w` the weights used at the unique values of `x`. `yin` the `y` values used at the unique `y` values. `tol` the `tol` argument (whose default depends on `x`). `data` only if keep.data = TRUE: itself a list with components `x`, `y` and `w` (if applicable). These are the original (x_i,y_i,w_i), i = 1, …, n, values where `data\$x` may have repeated values and hence be longer than the above `x` component; see details. `lev` leverages, the diagonal values of the smoother matrix. `cv.crit` cross-validation score. `pen.crit` the penalized criterion, a non-negative number; simply the (weighted) residual sum of squares (RSS). `crit` the criterion value minimized in the underlying `df2lambda` function. When `df` is provided, the criterion is [tr(S_{λ}) - df]^2. `df` equivalent degrees of freedom used. `spar` the value of `spar` computed or given, i.e., s = 1 + \log_{256}(λ)/3 `lambda` the value of λ corresponding to `spar`, i.e., λ = 256^{3*(s-1)}. `fit` list for use by `predict.ss`, with components n:number of observations. knot:the knot sequence. nk:number of coefficients (# knots plus m). coef:coefficients for the spline basis used. min, range:numbers giving the corresponding quantities of `x` m:spline penalty order (same as input `m`) periodic:is spline periodic? cov.sqrtsquare root of covariance matrix of `coef` such that `tcrossprod(coef)` reconstructs the covariance matrix. weightedwere weights `w` used in fitting? bernoulliwere Bernoulli polynomials used in fitting? `call` the matched call. `sigma` estimated error standard deviation. `logLik` log-likelihood (if `method` is REML or ML). `aic` Akaike's Information Criterion (if `method` is AIC). `bic` Bayesian Information Criterion (if `method` is BIC). `penalty` smoothness penalty γ' Q γ, which is the integrated squared m-th derivative of the estimated function f(x). `method` smoothing parameter selection method. Will be `NULL` if `df`, `spar`, or `lambda` is provided.

## Methods

The smoothing parameter can be selected using one of eight methods:
Generalized Cross-Validation (GCV)
Ordinary Cross-Validation (OCV)
Generalized Approximate Cross-Validation (GACV)
Approximate Cross-Validation (ACV)
Restricted Maximum Likelihood (REML)
Maximum Likelihood (ML)
Akaike's Information Criterion (AIC)
Bayesian Information Criterion (BIC)

## Note

The number of unique x values, nx, are determined by the tol argument, equivalently to

`nx <- sum(!duplicated( round((x - mean(x)) / tol) ))`

In this case where not all unique x values are used as knots, the result is not a smoothing spline in the strict sense, but very close unless a small smoothing parameter (or large `df`) is used.

## Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

## References

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/smooth.spline.html

Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377-403. doi: 10.1007/BF01404567

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi: 10.1007/978-1-4614-5369-7

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036885885

Helwig, N. E. (2020+). Spectrally sparse nonparametric regression via elastic net regularized smoothers. Journal of Computational and Graphical Statistics. doi: 10.1080/10618600.2020.1806855

Wahba, G. (1985). A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. The Annals of Statistics, 4, 1378-1402. doi: 10.1214/aos/1176349743

## See Also

`summary.ss` for summarizing `ss` objects.

`predict.ss` for predicting from `ss` objects.

`sm` for fitting smooth models with multiple predictors of mixed types (Gaussian response).

`gsm` for fitting generalized smooth models with multiple predictors of mixed types (non-Gaussian response).

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56``` ```# generate data set.seed(1) n <- 100 x <- seq(0, 1, length.out = n) fx <- 2 + 3 * x + sin(2 * pi * x) y <- fx + rnorm(n, sd = 0.5) # GCV selection (default) ss.GCV <- ss(x, y, nknots = 10) ss.GCV # OCV selection ss.OCV <- ss(x, y, method = "OCV", nknots = 10) ss.OCV # GACV selection ss.GACV <- ss(x, y, method = "GACV", nknots = 10) ss.GACV # ACV selection ss.ACV <- ss(x, y, method = "ACV", nknots = 10) ss.ACV # ML selection ss.ML <- ss(x, y, method = "ML", nknots = 10) ss.ML # REML selection ss.REML <- ss(x, y, method = "REML", nknots = 10) ss.REML # AIC selection ss.AIC <- ss(x, y, method = "AIC", nknots = 10) ss.AIC # BIC selection ss.BIC <- ss(x, y, method = "BIC", nknots = 10) ss.BIC # compare results mean( ( fx - ss.GCV\$y )^2 ) mean( ( fx - ss.OCV\$y )^2 ) mean( ( fx - ss.GACV\$y )^2 ) mean( ( fx - ss.ACV\$y )^2 ) mean( ( fx - ss.ML\$y )^2 ) mean( ( fx - ss.REML\$y )^2 ) mean( ( fx - ss.AIC\$y )^2 ) mean( ( fx - ss.BIC\$y )^2 ) # plot results plot(x, y) rlist <- list(ss.GCV, ss.OCV, ss.GACV, ss.ACV, ss.REML, ss.ML, ss.AIC, ss.BIC) for(j in 1:length(rlist)){ lines(rlist[[j]], lwd = 2, col = j) } ```

npreg documentation built on April 23, 2021, 9:07 a.m.