ordspline: Fits Ordinal Smoothing Spline In bigsplines: Smoothing Splines for Large Samples

Description

Given a real-valued response vector \mathbf{y}=\{y_{i}\}_{n\times1} and an ordinal predictor vector \mathbf{x}=\{x_{i}\}_{n\times 1} with x_{i} \in \{1,…,K\} \ \forall i, an ordinal smoothing spline model has the form

y_{i}=η(x_{i})+e_{i}

where y_{i} is the i-th observation's respone, x_{i} is the i-th observation's predictor, η is an unknown function relating the response and predictor, and e_{i}\sim\mathrm{N}(0,σ^{2}) is iid Gaussian error.

Usage

 1 ordspline(x, y, knots, weights, lambda, monotone=FALSE)

Arguments

 x Predictor vector. y Response vector. Must be same length as x. knots Either a scalar giving the number of equidistant knots to use, or a vector of values to use as the spline knots. If left blank, the number of knots is min(50, nu) where nu = length(unique(x)). weights Weights vector (for weighted penalized least squares). Must be same length as x and contain non-negative values. lambda Smoothing parameter. If left blank, lambda is tuned via Generalized Cross-Validation. monotone If TRUE, the relationship between x and y is constrained to be monotonic increasing.

Details

To estimate η I minimize the penalized least-squares functional

\frac{1}{n}∑_{i=1}^{n}(y_{i}-η(x_{i}))^{2}+λ ∑_{x=2}^K [η(x)-η(x-1)]^2 dx

where λ≥q0 is a smoothing parameter that controls the trade-off between fitting and smoothing the data.

Default use of the function estimates λ by minimizing the GCV score:

\mbox{GCV}(λ) = \frac{n\|(\mathbf{I}_{n}-\mathbf{S}_{λ})\mathbf{y}\|^{2}}{[n-\mathrm{tr}(\mathbf{S}_{λ})]^2}

where \mathbf{I}_{n} is the identity matrix and \mathbf{S}_{λ} is the smoothing matrix.

Value

 fitted.values Vector of fitted values. se.fit Vector of standard errors of fitted.values. sigma Estimated error standard deviation, i.e., \hat{σ}. lambda Chosen smoothing parameter. info Model fit information: vector containing the GCV, R-squared, AIC, and BIC of fit model (assuming Gaussian error). coef Spline basis function coefficients. coef.csqrt Matrix square-root of covariace matrix of coef. Use tcrossprod(coef.csqrt) to get covariance matrix of coef. n Number of data points, i.e., length(x). df Effective degrees of freedom (trace of smoothing matrix). xunique Unique elements of x. x Predictor vector (same as input). y Response vector (same as input). residuals Residual vector, i.e., y - fitted.values. knots Spline knots used for fit. monotone Logical (same as input).

Warnings

When inputting user-specified knots, all values in knots must match a corresponding value in x.

Note

The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.

Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.

Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.

Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ########## EXAMPLE ########## # generate some data n <- 100 nk <- 50 x <- seq(-3,3,length.out=n) eta <- (sin(2*x/pi) + 0.25*x^3 + 0.05*x^5)/15 set.seed(1) y <- eta + rnorm(n, sd=0.5) # plot data and true eta plot(x, y) lines(x, eta, col="blue", lwd=2) # fit ordinal smoothing spline ossmod <- ordspline(x, y, knots=nk) lines(ossmod$x, ossmod$fit, col="red", lwd=2) # fit monotonic smoothing spline mssmod <- ordspline(x, y, knots=nk, monotone=TRUE) lines(mssmod$x, mssmod$fit, col="purple", lwd=2)

bigsplines documentation built on May 2, 2019, 9:27 a.m.