# bigtps: Fits Cubic Thin-Plate Splines In taylerablake/thin-plate-splines: Smoothing Splines for Large Samples

## Description

Given a real-valued response vector \mathbf{y}=\{y_{i}\}_{n\times1}, a thin-plate spline model has the form

y_{i}=η(\mathbf{x}_{i})+e_{i}

where y_{i} is the i-th observation's respone, \mathbf{x}_{i}=(x_{i1},…,x_{id}) is the i-th observation's nonparametric predictor vector, η is an unknown smooth function relating the response and predictor, and e_{i}\sim\mathrm{N}(0,σ^{2}) is iid Gaussian error. Function only fits interaction models.

## Usage

 1 2 3 bigtps(x,y,nknots=NULL,nvec=NULL,rparm=NA, alpha=1,lambdas=NULL,se.fit=FALSE, rseed=1234,knotcheck=TRUE) 

## Arguments

 x Predictor vector or matrix with three or less columns. y Response vector. Must be same length as x has rows. nknots Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of x to use as knots. nvec Number of eigenvectors (and eigenvalues) to use in approximation. Must be less than or equal to the number of knots and greater than or equal to ncol(x)+2. Default sets nvec<-nknots. Can also input 0

## Details

To estimate η I minimize the penalized least-squares functional

\frac{1}{n}∑_{i=1}^{n}(y_{i}-η(\mathbf{x}_{i}))^{2}+λ J(η)

where J(η) is the thin-plate penalty (see Helwig and Ma) and λ≥q0 is a smoothing parameter that controls the trade-off between fitting and smoothing the data. Default use of the function estimates λ by minimizing the GCV score (see bigspline).

Using the rounding parameter input rparm can greatly speed-up and stabilize the fitting for large samples. When rparm is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). Rounding parameter should be on the raw data scale.

## Value

 fitted.values Vector of fitted values corresponding to the original data points in x (if rparm=NA) or the rounded data points in xunique (if rparm is used). se.fit Vector of standard errors of fitted.values (if input se.fit=TRUE). x Predictor vector (same as input). y Response vector (same as input). xunique Unique elements of x after rounding (if rparm is used). yunique Mean of y for unique elements of x after rounding (if rparm is used). funique Vector giving frequency of each element of xunique (if rparm is used). sigma Estimated error standard deviation, i.e., \hat{σ}. ndf Data frame with two elements: n is total sample size, and df is effective degrees of freedom of fit model (trace of smoothing matrix). info Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). myknots Spline knots used for fit. nvec Number of eigenvectors used for solution. rparm Rounding parameter for x (same as input). lambda Optimal smoothing parameter. coef Spline basis function coefficients. coef.csqrt Matrix square-root of covariace matrix of coef. Use tcrossprod(coef.csqrt) to get covariance matrix of coef.

## Warnings

Input nvec must be greater than ncol(x)+1.

When using rounding parameters, output fitted.values corresponds to unique rounded predictor scores in output xunique. Use predict.bigtps function to get fitted values for full y vector.

## Computational Details

According to thin-plate spline theory, the function η can be approximated as

η(x) = ∑_{k=1}^{M}d_{k}φ_{k}(\mathbf{x}) + ∑_{h=1}^{q}c_{h}ξ(\mathbf{x},\mathbf{x}_{h}^{*})

where the \{φ_{k}\}_{k=1}^{M} are linear functions, ξ is the thin-plate spline semi-kernel, \{\mathbf{x}_{h}^{*}\}_{h=1}^{q} are the knots, and the c_{h} coefficients are constrained to be orthongonal to the \{φ_{k}\}_{k=1}^{M} functions.

This implies that the penalized least-squares functional can be rewritten as

\|\mathbf{y} - \mathbf{K}\mathbf{d} - \mathbf{J}\mathbf{c}\|^{2} + nλ\mathbf{c}'\mathbf{Q}\mathbf{c}

where \mathbf{K}=\{φ(\mathbf{x}_{i})\}_{n \times M} is the null space basis function matrix, \mathbf{J}=\{ξ(\mathbf{x}_{i},\mathbf{x}_{h}^{*})\}_{n \times q} is the contrast space basis funciton matrix, \mathbf{Q}=\{ξ(\mathbf{x}_{g}^{*},\mathbf{x}_{h}^{*})\}_{q \times q} is the penalty matrix, and \mathbf{d}=(d_{0},…,d_{M})' and \mathbf{c}=(c_{1},…,c_{q})' are the unknown basis function coefficients, where \mathbf{c} are constrained to be orthongonal to the \{φ_{k}\}_{k=1}^{M} functions.

See Helwig and Ma for specifics about how the constrained estimation is handled.

## Note

The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.

## Author(s)

Nathaniel E. Helwig <[email protected]>

## References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.

Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.

Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit thin-plate spline (default 1 dim: 30 knots) tpsmod <- bigtps(x,y) tpsmod ########## EXAMPLE 2 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x+cos(2*pi*x) } x <- runif(500)*4 y <- myfun(x) + rnorm(500) # try different numbers of knots r1mod <- bigtps(x,y,nknots=20,rparm=0.01) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigtps(x,y,nknots=35,rparm=0.01) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigtps(x,y,nknots=50,rparm=0.01) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted) ########## EXAMPLE 3 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x <- cbind(runif(500),runif(500)) y <- myfun(x[,1],x[,2]) + rnorm(500) # fit thin-plate spline with 50 knots (default 2 dim: 100 knots) tpsmod <- bigtps(x,y,nknots=50) tpsmod crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500 ########## EXAMPLE 4 ########## # function with three continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v) } x <- cbind(runif(500),runif(500),runif(500)) y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500) # fit thin-plate spline with 50 knots (default 3 dim: 200 knots) tpsmod <- bigtps(x,y,nknots=50) tpsmod crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod\$fitted.values )/500 `

taylerablake/thin-plate-splines documentation built on Sept. 19, 2017, 9:45 a.m.