Description Usage Arguments Details Value Warnings Computational Details Note Author(s) References Examples
Given a real-valued response vector \mathbf{y}=\{y_{i}\}_{n\times1}, a thin-plate spline model has the form
y_{i}=η(\mathbf{x}_{i})+e_{i}
where y_{i} is the i-th observation's respone, \mathbf{x}_{i}=(x_{i1},…,x_{id}) is the i-th observation's nonparametric predictor vector, η is an unknown smooth function relating the response and predictor, and e_{i}\sim\mathrm{N}(0,σ^{2}) is iid Gaussian error. Function only fits interaction models.
1 2 3  | 
x | 
 Predictor vector or matrix with three or less columns.  | 
y | 
 Response vector. Must be same length as   | 
nknots | 
 Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of   | 
nvec | 
 Number of eigenvectors (and eigenvalues) to use in approximation. Must be less than or equal to the number of knots and greater than or equal to   | 
rparm | 
 Rounding parameter(s) for   | 
alpha | 
 Manual tuning parameter for GCV score. Using   | 
lambdas | 
 Vector of global smoothing parameters to try. Default estimates smoothing parameter that minimizes GCV score.  | 
se.fit | 
 Logical indicating if the standard errors of fitted values should be estimated.  | 
rseed | 
 Random seed for knot sampling. Input is ignored if   | 
knotcheck | 
 If   | 
To estimate η I minimize the penalized least-squares functional
\frac{1}{n}∑_{i=1}^{n}(y_{i}-η(\mathbf{x}_{i}))^{2}+λ J(η)
 where J(η) is the thin-plate penalty (see Helwig and Ma) and λ≥q0 is a smoothing parameter that controls the trade-off between fitting and smoothing the data. Default use of the function estimates λ by minimizing the GCV score (see bigspline).
Using the rounding parameter input rparm can greatly speed-up and stabilize the fitting for large samples. When rparm is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). Rounding parameter should be on the raw data scale. 
fitted.values | 
 Vector of fitted values corresponding to the original data points in   | 
se.fit | 
 Vector of standard errors of   | 
x | 
 Predictor vector (same as input).  | 
y | 
 Response vector (same as input).  | 
xunique | 
 Unique elements of   | 
yunique | 
 Mean of   | 
funique | 
 Vector giving frequency of each element of   | 
sigma | 
 Estimated error standard deviation, i.e., \hat{σ}.  | 
ndf | 
 Data frame with two elements:   | 
info | 
 Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error).  | 
myknots | 
 Spline knots used for fit.  | 
nvec | 
 Number of eigenvectors used for solution.  | 
rparm | 
 Rounding parameter for   | 
lambda | 
 Optimal smoothing parameter.  | 
coef | 
 Spline basis function coefficients.  | 
coef.csqrt | 
 Matrix square-root of covariace matrix of   | 
Input nvec must be greater than ncol(x)+1.
When using rounding parameters, output fitted.values corresponds to unique rounded predictor scores in output xunique. Use predict.bigtps function to get fitted values for full y vector.
According to thin-plate spline theory, the function η can be approximated as
η(x) = ∑_{k=1}^{M}d_{k}φ_{k}(\mathbf{x}) + ∑_{h=1}^{q}c_{h}ξ(\mathbf{x},\mathbf{x}_{h}^{*})
where the \{φ_{k}\}_{k=1}^{M} are linear functions, ξ is the thin-plate spline semi-kernel, \{\mathbf{x}_{h}^{*}\}_{h=1}^{q} are the knots, and the c_{h} coefficients are constrained to be orthongonal to the \{φ_{k}\}_{k=1}^{M} functions.
This implies that the penalized least-squares functional can be rewritten as
\|\mathbf{y} - \mathbf{K}\mathbf{d} - \mathbf{J}\mathbf{c}\|^{2} + nλ\mathbf{c}'\mathbf{Q}\mathbf{c}
where \mathbf{K}=\{φ(\mathbf{x}_{i})\}_{n \times M} is the null space basis function matrix, \mathbf{J}=\{ξ(\mathbf{x}_{i},\mathbf{x}_{h}^{*})\}_{n \times q} is the contrast space basis funciton matrix, \mathbf{Q}=\{ξ(\mathbf{x}_{g}^{*},\mathbf{x}_{h}^{*})\}_{q \times q} is the penalty matrix, and \mathbf{d}=(d_{0},…,d_{M})' and \mathbf{c}=(c_{1},…,c_{q})' are the unknown basis function coefficients, where \mathbf{c} are constrained to be orthongonal to the \{φ_{k}\}_{k=1}^{M} functions.
See Helwig and Ma for specifics about how the constrained estimation is handled.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <helwig@umn.edu>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  | ##########   EXAMPLE 1   ##########
# define relatively smooth function
set.seed(773)
myfun <- function(x){ sin(2*pi*x) }
x <- runif(500)
y <- myfun(x) + rnorm(500)
# fit thin-plate spline (default 1 dim: 30 knots)
tpsmod <- bigtps(x,y)
tpsmod
##########   EXAMPLE 2   ##########
# define more jagged function
set.seed(773)
myfun <- function(x){ 2*x+cos(2*pi*x) }
x <- runif(500)*4
y <- myfun(x) + rnorm(500)
# try different numbers of knots
r1mod <- bigtps(x,y,nknots=20,rparm=0.01)
crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted)
r2mod <- bigtps(x,y,nknots=35,rparm=0.01)
crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted)
r3mod <- bigtps(x,y,nknots=50,rparm=0.01)
crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted)
##########   EXAMPLE 3   ##########
# function with two continuous predictors
set.seed(773)
myfun <- function(x1v,x2v){
  sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v))
}
x <- cbind(runif(500),runif(500))
y <- myfun(x[,1],x[,2]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 2 dim: 100 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500
##########   EXAMPLE 4   ##########
# function with three continuous predictors
set.seed(773)
myfun <- function(x1v,x2v,x3v){
  sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v)
  }
x <- cbind(runif(500),runif(500),runif(500))
y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 3 dim: 200 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod$fitted.values )/500
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.