ss: Fit a Smoothing Spline
In npreg: Nonparametric Regression via Smoothing Splines

View source: R/ss.R

ss	R Documentation

Fit a Smoothing Spline

Description

Fits a smoothing spline with the smoothing parameter selected via one of eight methods: GCV, OCV, GACV, ACV, REML, ML, AIC, or BIC.

Usage

ss(x, y = NULL, w = NULL, df, spar = NULL, lambda = NULL,
   method = c("GCV", "OCV", "GACV", "ACV", "REML", "ML", "AIC", "BIC"), 
   m = 2L, periodic = FALSE, all.knots = FALSE, nknots = .nknots.smspl, 
   knots = NULL, keep.data = TRUE, df.offset = 0, penalty = 1, 
   control.spar = list(), tol = 1e-6 * IQR(x), bernoulli = TRUE,
   xmin = NULL, xmax = NULL, homosced = TRUE, iter.max = 1)

Arguments

`x`	Predictor vector of length `n`. Can also input a list or a two-column matrix specifying x and y.
`y`	Response vector of length `n`. If y is missing or NULL, the responses are assumed to be specified by x, with x the index vector.
`w`	Weights vector of length `n`. Defaults to all 1.
`df`	Equivalent degrees of freedom (trace of the smoother matrix). Must be in `[m,nx]`, where `nx` is the number of unique x values, see below.
`spar`	Smoothing parameter. Typically (but not always) in the range `(0,1]`. If specified `lambda = 256^(3*(spar-1))`.
`lambda`	Computational smoothing parameter. This value is weighted by `n` to form the penalty coefficient (see Details). Ignored if `spar` is provided.
`method`	Method for selecting the smoothing parameter. Ignored if `spar` or `lambda` is provided.
`m`	Penalty order (integer). The penalty functional is the integrated squared `m`-th derivative of the function. Defaults to `m = 2`, which is a cubic smoothing spline. Set `m = 1` for a linear smoothing spline or `m = 3` for a quintic smoothing spline.
`periodic`	Logical. If `TRUE`, the estimated function `f(x)` is constrained to be periodic, i.e., `f(a) = f(b)` where `a = \min(x)` and `b = \max(x)`.
`all.knots`	If `TRUE`, all distinct points in x are used as knots. If `FALSE` (default), a sequence knots is placed at the quantiles of the unique x values; in this case, the input `nknots` specifies the number of knots in the sequence. Ignored if the knot values are input using the `knots` argument.
`nknots`	Positive integer or function specifying the number of knots. Ignored if either `all.knots = TRUE` or the knot values are input using the `knots` argument.
`knots`	Vector of knot values for the spline. Should be unique and within the range of the x values (to avoid a warning).
`keep.data`	Logical. If `TRUE`, the original data as a part of the output object.
`df.offset`	Allows the degrees of freedom to be increased by `df.offset` in the GCV criterion.
`penalty`	The coefficient of the penalty for degrees of freedom in the GCV criterion.
`control.spar`	Optional list with named components controlling the root finding when the smoothing parameter spar is computed, i.e., missing or NULL, see below. Note that spar is only searched for in the interval `[lower, upper]`. lower: lower bound for spar; defaults to -1.5 upper: upper bound for spar; defaults to 1.5 tol: the absolute precision (tolerance) used by `optimize`; defaults to 1e-8.
`tol`	Tolerance for same-ness or uniqueness of the x values. The values are binned into bins of size tol and values which fall into the same bin are regarded as the same. Must be strictly positive (and finite).
`bernoulli`	If `TRUE`, scaled Bernoulli polynomials are used for the basis and penalty functions. If `FALSE`, produces the "classic" definition of a smoothing spline, where the function estimate is a piecewise polynomial function with pieces of degree `2m - 1`. See `polynomial` for details.
`xmin`	Minimum x value used to transform predictor scores to [0,1]. If NULL, `xmin = min(x)`.
`xmax`	Maximum x value used to transform predictor scores to [0,1]. If NULL, `xmax = max(x)`.
`homosced`	Are error variances homoscedastic? If `FALSE`, variance weights are (iteratively?) estimated from the data.
`iter.max`	Maximum number of iterations for variance weight estimation. Ignored if `homosced = TRUE`.

Details

Inspired by the smooth.spline function in R's stats package.

Neither x nor y are allowed to containing missing or infinite values.

The x vector should contain at least 2m distinct values. 'Distinct' here is controlled by tol: values which are regarded as the same are replaced by the first of their values and the corresponding y and w are pooled accordingly.

Unless lambda has been specified instead of spar, the computational \lambda used (as a function of spar) is \lambda = 256^{3(s - 1)}, where s = spar.

If spar and lambda are missing or NULL, the value of df is used to determine the degree of smoothing. If df is missing as well, the specified method is used to determine \lambda.

Letting f_i = f(x_i), the function is represented as

f = X \beta + Z \alpha

where the basis functions in X span the null space (i.e., functions with m-th derivative of zero), and Z contains the reproducing kernel function of the contrast space evaluated at all combinations of observed data points and knots, i.e., Z[i,j] = R(x_i, k_j) where R is the kernel function and k_j is the j-th knot. The vectors \beta and \alpha contain unknown basis function coefficients. Letting M = (X, Z) and \gamma = (\beta', \alpha')', the penalized least squares problem has the form

(y - M \gamma)' W (y - M \gamma) + n \lambda \alpha' Q \alpha

where W is a diagonal matrix containg the weights, and Q is the penalty matrix. Note that Q[i,j] = R(k_i, k_j) contains the reproducing kernel function evaluated at all combinations of knots. The optimal coefficients are the solution to

(M' W M + n \lambda P) \gamma = M' W y

where P is the penalty matrix Q augmented with zeros corresponding to the \beta in \gamma.

Value

An object of class "ss" with components:

`x`	the distinct `x` values in increasing order; see Note.
`y`	the fitted values corresponding to `x`.
`w`	the weights used at the unique values of `x`.
`yin`	the `y` values used at the unique `y` values.
`tol`	the `tol` argument (whose default depends on `x`).
`data`	only if keep.data = TRUE: itself a list with components `x`, `y` and `w` (if applicable). These are the original `(x_i,y_i,w_i), i = 1, \ldots, n`, values where `data$x` may have repeated values and hence be longer than the above `x` component; see details.
`lev`	leverages, the diagonal values of the smoother matrix.
`cv.crit`	cross-validation score.
`pen.crit`	the penalized criterion, a non-negative number; simply the (weighted) residual sum of squares (RSS).
`crit`	the criterion value minimized in the underlying `df2lambda` function. When `df` is provided, the criterion is `[tr(S_{\lambda}) - df]^2`.
`df`	equivalent degrees of freedom used.
`df.residual`	the residual degrees of freedom = `nobs - df`
`spar`	the value of `spar` computed or given, i.e., `s = 1 + \log_{256}(\lambda)/3`
`lambda`	the value of `\lambda` corresponding to `spar`, i.e., `\lambda = 256^{3(s-1)}`.
`fit`	list for use by `predict.ss`, with components n: number of observations. knot: the knot sequence. nk: number of coefficients (# knots plus `m`). coef: coefficients for the spline basis used. min, range: numbers giving the corresponding quantities of `x` m: spline penalty order (same as input `m`) periodic: is spline periodic? cov.sqrt square root of covariance matrix of `coef` such that `tcrossprod(coef)` reconstructs the covariance matrix. weighted were weights `w` used in fitting? df.offset same as input penalty same as input control.spar control parameters for smoothing parameter selection bernoulli were Bernoulli polynomials used in fitting?
`call`	the matched call.
`sigma`	estimated error standard deviation.
`logLik`	log-likelihood (if `method` is REML or ML).
`aic`	Akaike's Information Criterion (if `method` is AIC).
`bic`	Bayesian Information Criterion (if `method` is BIC).
`penalty`	smoothness penalty `\alpha' Q \alpha`, which is the integrated squared `m`-th derivative of the estimated function `f(x)`.
`method`	smoothing parameter selection method. Will be `NULL` if `df`, `spar`, or `lambda` is provided.

Methods

The smoothing parameter can be selected using one of eight methods:
Generalized Cross-Validation (GCV)
Ordinary Cross-Validation (OCV)
Generalized Approximate Cross-Validation (GACV)
Approximate Cross-Validation (ACV)
Restricted Maximum Likelihood (REML)
Maximum Likelihood (ML)
Akaike's Information Criterion (AIC)
Bayesian Information Criterion (BIC)

Note

The number of unique x values, nx, are determined by the tol argument, equivalently to

nx <- sum(!duplicated( round((x - mean(x)) / tol) ))

In this case where not all unique x values are used as knots, the result is not a smoothing spline in the strict sense, but very close unless a small smoothing parameter (or large df) is used.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/smooth.spline.html

Berry, L. N., & Helwig, N. E. (2021). Cross-validation, information theory, or maximum likelihood? A comparison of tuning methods for penalized splines. Stats, 4(3), 701-724. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/stats4030042")}

Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377-403. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF01404567")}

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-1-4614-5369-7")}

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.4135/9781526421036885885")}

Helwig, N. E. (2021). Spectrally sparse nonparametric regression via elastic net regularized smoothers. Journal of Computational and Graphical Statistics, 30(1), 182-191. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2020.1806855")}

Wahba, G. (1985). A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. The Annals of Statistics, 4, 1378-1402. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aos/1176349743")}

Examples

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
fx <- 2 + 3 * x + sin(2 * pi * x)
y <- fx + rnorm(n, sd = 0.5)

# GCV selection (default)
ss.GCV <- ss(x, y, nknots = 10)
ss.GCV

# OCV selection
ss.OCV <- ss(x, y, method = "OCV", nknots = 10)
ss.OCV

# GACV selection
ss.GACV <- ss(x, y, method = "GACV", nknots = 10)
ss.GACV

# ACV selection
ss.ACV <- ss(x, y, method = "ACV", nknots = 10)
ss.ACV

# ML selection
ss.ML <- ss(x, y, method = "ML", nknots = 10)
ss.ML

# REML selection
ss.REML <- ss(x, y, method = "REML", nknots = 10)
ss.REML

# AIC selection
ss.AIC <- ss(x, y, method = "AIC", nknots = 10)
ss.AIC

# BIC selection
ss.BIC <- ss(x, y, method = "BIC", nknots = 10)
ss.BIC

# compare results
mean( ( fx - ss.GCV$y )^2 )
mean( ( fx - ss.OCV$y )^2 )
mean( ( fx - ss.GACV$y )^2 )
mean( ( fx - ss.ACV$y )^2 )
mean( ( fx - ss.ML$y )^2 )
mean( ( fx - ss.REML$y )^2 )
mean( ( fx - ss.AIC$y )^2 )
mean( ( fx - ss.BIC$y )^2 )

# plot results
plot(x, y)
rlist <- list(ss.GCV, ss.OCV, ss.GACV, ss.ACV,
              ss.REML, ss.ML, ss.AIC, ss.BIC)
for(j in 1:length(rlist)){
   lines(rlist[[j]], lwd = 2, col = j)
}

npreg documentation built on May 29, 2024, 4:17 a.m.

npreg index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

npreg
Nonparametric Regression via Smoothing Splines

ss: Fit a Smoothing Spline
In npreg: Nonparametric Regression via Smoothing Splines

Fit a Smoothing Spline

Description

Usage

Arguments

Details

Value

Methods

Note

Author(s)

References

See Also

Examples

Related to ss in npreg...

R Package Documentation

Browse R Packages

We want your feedback!

npreg Nonparametric Regression via Smoothing Splines

ss: Fit a Smoothing Spline In npreg: Nonparametric Regression via Smoothing Splines

Fit a Smoothing Spline

Description

Usage

Arguments

Details

Value

Methods

Note

Author(s)

References

See Also

Examples

Related to ss in npreg...

R Package Documentation

Browse R Packages

We want your feedback!

npreg
Nonparametric Regression via Smoothing Splines

ss: Fit a Smoothing Spline
In npreg: Nonparametric Regression via Smoothing Splines