ss | R Documentation |
Fits a smoothing spline with the smoothing parameter selected via one of eight methods: GCV, OCV, GACV, ACV, REML, ML, AIC, or BIC.
ss(x, y = NULL, w = NULL, df, spar = NULL, lambda = NULL,
method = c("GCV", "OCV", "GACV", "ACV", "REML", "ML", "AIC", "BIC"),
m = 2L, periodic = FALSE, all.knots = FALSE, nknots = .nknots.smspl,
knots = NULL, keep.data = TRUE, df.offset = 0, penalty = 1,
control.spar = list(), tol = 1e-6 * IQR(x), bernoulli = TRUE,
xmin = NULL, xmax = NULL, homosced = TRUE, iter.max = 1)
x |
Predictor vector of length |
y |
Response vector of length |
w |
Weights vector of length |
df |
Equivalent degrees of freedom (trace of the smoother matrix). Must be in |
spar |
Smoothing parameter. Typically (but not always) in the range |
lambda |
Computational smoothing parameter. This value is weighted by |
method |
Method for selecting the smoothing parameter. Ignored if |
m |
Penalty order (integer). The penalty functional is the integrated squared |
periodic |
Logical. If |
all.knots |
If |
nknots |
Positive integer or function specifying the number of knots. Ignored if either |
knots |
Vector of knot values for the spline. Should be unique and within the range of the x values (to avoid a warning). |
keep.data |
Logical. If |
df.offset |
Allows the degrees of freedom to be increased by |
penalty |
The coefficient of the penalty for degrees of freedom in the GCV criterion. |
control.spar |
Optional list with named components controlling the root finding when the smoothing parameter spar is computed, i.e., missing or NULL, see below. Note that spar is only searched for in the interval
|
tol |
Tolerance for same-ness or uniqueness of the x values. The values are binned into bins of size tol and values which fall into the same bin are regarded as the same. Must be strictly positive (and finite). |
bernoulli |
If |
xmin |
Minimum x value used to transform predictor scores to [0,1]. If NULL, |
xmax |
Maximum x value used to transform predictor scores to [0,1]. If NULL, |
homosced |
Are error variances homoscedastic? If |
iter.max |
Maximum number of iterations for variance weight estimation. Ignored if |
Inspired by the smooth.spline
function in R's stats package.
Neither x
nor y
are allowed to containing missing or infinite values.
The x
vector should contain at least 2m
distinct values. 'Distinct' here is controlled by tol
: values which are regarded as the same are replaced by the first of their values and the corresponding y
and w
are pooled accordingly.
Unless lambda
has been specified instead of spar
, the computational \lambda
used (as a function of spar
) is \lambda = 256^{3(s - 1)}
, where s =
spar
.
If spar
and lambda
are missing or NULL
, the value of df
is used to determine the degree of smoothing. If df
is missing as well, the specified method
is used to determine \lambda
.
Letting f_i = f(x_i)
, the function is represented as
f = X \beta + Z \alpha
where the basis functions in X
span the null space (i.e., functions with m
-th derivative of zero), and Z
contains the reproducing kernel function of the contrast space evaluated at all combinations of observed data points and knots, i.e., Z[i,j] = R(x_i, k_j)
where R
is the kernel function and k_j
is the j
-th knot. The vectors \beta
and \alpha
contain unknown basis function coefficients.
Letting M = (X, Z)
and \gamma = (\beta', \alpha')'
, the penalized least squares problem has the form
(y - M \gamma)' W (y - M \gamma) + n \lambda \alpha' Q \alpha
where W
is a diagonal matrix containg the weights, and Q
is the penalty matrix. Note that Q[i,j] = R(k_i, k_j)
contains the reproducing kernel function evaluated at all combinations of knots. The optimal coefficients are the solution to
(M' W M + n \lambda P) \gamma = M' W y
where P
is the penalty matrix Q
augmented with zeros corresponding to the \beta
in \gamma
.
An object of class "ss" with components:
x |
the distinct |
y |
the fitted values corresponding to |
w |
the weights used at the unique values of |
yin |
the |
tol |
the |
data |
only if keep.data = TRUE: itself a list with components |
lev |
leverages, the diagonal values of the smoother matrix. |
cv.crit |
cross-validation score. |
pen.crit |
the penalized criterion, a non-negative number; simply the (weighted) residual sum of squares (RSS). |
crit |
the criterion value minimized in the underlying |
df |
equivalent degrees of freedom used. |
df.residual |
the residual degrees of freedom = |
spar |
the value of |
lambda |
the value of |
fit |
list for use by
|
call |
the matched call. |
sigma |
estimated error standard deviation. |
logLik |
log-likelihood (if |
aic |
Akaike's Information Criterion (if |
bic |
Bayesian Information Criterion (if |
penalty |
smoothness penalty |
method |
smoothing parameter selection method. Will be |
The smoothing parameter can be selected using one of eight methods:
Generalized Cross-Validation (GCV)
Ordinary Cross-Validation (OCV)
Generalized Approximate Cross-Validation (GACV)
Approximate Cross-Validation (ACV)
Restricted Maximum Likelihood (REML)
Maximum Likelihood (ML)
Akaike's Information Criterion (AIC)
Bayesian Information Criterion (BIC)
The number of unique x values, nx, are determined by the tol argument, equivalently to
nx <- sum(!duplicated( round((x - mean(x)) / tol) ))
In this case where not all unique x values are used as knots, the result is not a smoothing spline in the strict sense, but very close unless a small smoothing parameter (or large df
) is used.
Nathaniel E. Helwig <helwig@umn.edu>
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/smooth.spline.html
Berry, L. N., & Helwig, N. E. (2021). Cross-validation, information theory, or maximum likelihood? A comparison of tuning methods for penalized splines. Stats, 4(3), 701-724. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/stats4030042")}
Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377-403. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF01404567")}
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-1-4614-5369-7")}
Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.4135/9781526421036885885")}
Helwig, N. E. (2021). Spectrally sparse nonparametric regression via elastic net regularized smoothers. Journal of Computational and Graphical Statistics, 30(1), 182-191. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2020.1806855")}
Wahba, G. (1985). A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. The Annals of Statistics, 4, 1378-1402. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aos/1176349743")}
Related Modeling Functions:
sm
for fitting smooth models with multiple predictors of mixed types (Gaussian response).
gsm
for fitting generalized smooth models with multiple predictors of mixed types (non-Gaussian response).
S3 Methods and Related Functions for "ss" Objects:
boot.ss
for bootstrapping ss
objects.
coef.ss
for extracting coefficients from ss
objects.
cooks.distance.ss
for calculating Cook's distances from ss
objects.
cov.ratio
for computing covariance ratio from ss
objects.
deviance.ss
for extracting deviance from ss
objects.
dfbeta.ss
for calculating DFBETA from ss
objects.
dfbetas.ss
for calculating DFBETAS from ss
objects.
diagnostic.plots
for plotting regression diagnostics from ss
objects.
fitted.ss
for extracting fitted values from ss
objects.
hatvalues.ss
for extracting leverages from ss
objects.
model.matrix.ss
for constructing model matrix from ss
objects.
plot.ss
for plotting predictions from ss
objects.
plot.boot.ss
for plotting boot.ss
objects.
predict.ss
for predicting from ss
objects.
residuals.ss
for extracting residuals from ss
objects.
rstandard.ss
for computing standardized residuals from ss
objects.
rstudent.ss
for computing studentized residuals from ss
objects.
smooth.influence
for calculating basic influence information from ss
objects.
smooth.influence.measures
for convenient display of influential observations from ss
objects.
summary.ss
for summarizing ss
objects.
vcov.ss
for extracting coefficient covariance matrix from ss
objects.
weights.ss
for extracting prior weights from ss
objects.
# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
fx <- 2 + 3 * x + sin(2 * pi * x)
y <- fx + rnorm(n, sd = 0.5)
# GCV selection (default)
ss.GCV <- ss(x, y, nknots = 10)
ss.GCV
# OCV selection
ss.OCV <- ss(x, y, method = "OCV", nknots = 10)
ss.OCV
# GACV selection
ss.GACV <- ss(x, y, method = "GACV", nknots = 10)
ss.GACV
# ACV selection
ss.ACV <- ss(x, y, method = "ACV", nknots = 10)
ss.ACV
# ML selection
ss.ML <- ss(x, y, method = "ML", nknots = 10)
ss.ML
# REML selection
ss.REML <- ss(x, y, method = "REML", nknots = 10)
ss.REML
# AIC selection
ss.AIC <- ss(x, y, method = "AIC", nknots = 10)
ss.AIC
# BIC selection
ss.BIC <- ss(x, y, method = "BIC", nknots = 10)
ss.BIC
# compare results
mean( ( fx - ss.GCV$y )^2 )
mean( ( fx - ss.OCV$y )^2 )
mean( ( fx - ss.GACV$y )^2 )
mean( ( fx - ss.ACV$y )^2 )
mean( ( fx - ss.ML$y )^2 )
mean( ( fx - ss.REML$y )^2 )
mean( ( fx - ss.AIC$y )^2 )
mean( ( fx - ss.BIC$y )^2 )
# plot results
plot(x, y)
rlist <- list(ss.GCV, ss.OCV, ss.GACV, ss.ACV,
ss.REML, ss.ML, ss.AIC, ss.BIC)
for(j in 1:length(rlist)){
lines(rlist[[j]], lwd = 2, col = j)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.