Description Usage Arguments Details Value Warnings Computational Details Note Author(s) References Examples
Given a real-valued response vector \mathbf{y}=\{y_{i}\}_{n\times1}, a thin-plate spline model has the form
y_{i}=η(\mathbf{x}_{i})+e_{i}
where y_{i} is the i-th observation's respone, \mathbf{x}_{i}=(x_{i1},…,x_{id}) is the i-th observation's nonparametric predictor vector, η is an unknown smooth function relating the response and predictor, and e_{i}\sim\mathrm{N}(0,σ^{2}) is iid Gaussian error. Function only fits interaction models.
1 2 3 |
x |
Predictor vector or matrix with three or less columns. |
y |
Response vector. Must be same length as |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
nvec |
Number of eigenvectors (and eigenvalues) to use in approximation. Must be less than or equal to the number of knots and greater than or equal to |
rparm |
Rounding parameter(s) for |
alpha |
Manual tuning parameter for GCV score. Using |
lambdas |
Vector of global smoothing parameters to try. Default estimates smoothing parameter that minimizes GCV score. |
se.fit |
Logical indicating if the standard errors of fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
knotcheck |
If |
To estimate η I minimize the penalized least-squares functional
\frac{1}{n}∑_{i=1}^{n}(y_{i}-η(\mathbf{x}_{i}))^{2}+λ J(η)
where J(η) is the thin-plate penalty (see Helwig and Ma) and λ≥q0 is a smoothing parameter that controls the trade-off between fitting and smoothing the data. Default use of the function estimates λ by minimizing the GCV score (see bigspline
).
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. When rparm
is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). Rounding parameter should be on the raw data scale.
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
x |
Predictor vector (same as input). |
y |
Response vector (same as input). |
xunique |
Unique elements of |
yunique |
Mean of |
funique |
Vector giving frequency of each element of |
sigma |
Estimated error standard deviation, i.e., \hat{σ}. |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
myknots |
Spline knots used for fit. |
nvec |
Number of eigenvectors used for solution. |
rparm |
Rounding parameter for |
lambda |
Optimal smoothing parameter. |
coef |
Spline basis function coefficients. |
coef.csqrt |
Matrix square-root of covariace matrix of |
Input nvec
must be greater than ncol(x)+1
.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigtps
function to get fitted values for full y
vector.
According to thin-plate spline theory, the function η can be approximated as
η(x) = ∑_{k=1}^{M}d_{k}φ_{k}(\mathbf{x}) + ∑_{h=1}^{q}c_{h}ξ(\mathbf{x},\mathbf{x}_{h}^{*})
where the \{φ_{k}\}_{k=1}^{M} are linear functions, ξ is the thin-plate spline semi-kernel, \{\mathbf{x}_{h}^{*}\}_{h=1}^{q} are the knots, and the c_{h} coefficients are constrained to be orthongonal to the \{φ_{k}\}_{k=1}^{M} functions.
This implies that the penalized least-squares functional can be rewritten as
\|\mathbf{y} - \mathbf{K}\mathbf{d} - \mathbf{J}\mathbf{c}\|^{2} + nλ\mathbf{c}'\mathbf{Q}\mathbf{c}
where \mathbf{K}=\{φ(\mathbf{x}_{i})\}_{n \times M} is the null space basis function matrix, \mathbf{J}=\{ξ(\mathbf{x}_{i},\mathbf{x}_{h}^{*})\}_{n \times q} is the contrast space basis funciton matrix, \mathbf{Q}=\{ξ(\mathbf{x}_{g}^{*},\mathbf{x}_{h}^{*})\}_{q \times q} is the penalty matrix, and \mathbf{d}=(d_{0},…,d_{M})' and \mathbf{c}=(c_{1},…,c_{q})' are the unknown basis function coefficients, where \mathbf{c} are constrained to be orthongonal to the \{φ_{k}\}_{k=1}^{M} functions.
See Helwig and Ma for specifics about how the constrained estimation is handled.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <helwig@umn.edu>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ########## EXAMPLE 1 ##########
# define relatively smooth function
set.seed(773)
myfun <- function(x){ sin(2*pi*x) }
x <- runif(500)
y <- myfun(x) + rnorm(500)
# fit thin-plate spline (default 1 dim: 30 knots)
tpsmod <- bigtps(x,y)
tpsmod
########## EXAMPLE 2 ##########
# define more jagged function
set.seed(773)
myfun <- function(x){ 2*x+cos(2*pi*x) }
x <- runif(500)*4
y <- myfun(x) + rnorm(500)
# try different numbers of knots
r1mod <- bigtps(x,y,nknots=20,rparm=0.01)
crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted)
r2mod <- bigtps(x,y,nknots=35,rparm=0.01)
crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted)
r3mod <- bigtps(x,y,nknots=50,rparm=0.01)
crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted)
########## EXAMPLE 3 ##########
# function with two continuous predictors
set.seed(773)
myfun <- function(x1v,x2v){
sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v))
}
x <- cbind(runif(500),runif(500))
y <- myfun(x[,1],x[,2]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 2 dim: 100 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500
########## EXAMPLE 4 ##########
# function with three continuous predictors
set.seed(773)
myfun <- function(x1v,x2v,x3v){
sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v)
}
x <- cbind(runif(500),runif(500),runif(500))
y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 3 dim: 200 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod$fitted.values )/500
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.