Categorical Factor Regression Spline CrossValidation
Description
frscv
computes exhaustive crossvalidation directed search for
a regression spline estimate of a one (1) dimensional dependent
variable on an r
dimensional vector of continuous predictors
and nominal/ordinal (factor
/ordered
)
predictors.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  frscv(xz,
y,
degree.max = 10,
segments.max = 10,
degree.min = 0,
segments.min = 1,
complexity = c("degreeknots","degree","knots"),
knots = c("quantiles","uniform","auto"),
basis = c("additive","tensor","glp","auto"),
cv.func = c("cv.ls","cv.gcv","cv.aic"),
degree = degree,
segments = segments,
tau = NULL,
weights = NULL,
singular.ok = FALSE)

Arguments
y 
continuous univariate vector 
xz 
continuous and/or nominal/ordinal
( 
degree.max 
the maximum degree of the Bspline basis for
each of the continuous predictors (default 
segments.max 
the maximum segments of the Bspline basis for
each of the continuous predictors (default 
degree.min 
the minimum degree of the Bspline basis for
each of the continuous predictors (default 
segments.min 
the minimum segments of the Bspline basis for
each of the continuous predictors (default 
complexity 
a character string (default

knots 
a character string (default 
basis 
a character string (default 
cv.func 
a character string (default 
degree 
integer/vector specifying the degree of the Bspline
basis for each dimension of the continuous 
segments 
integer/vector specifying the number of segments of
the Bspline basis for each dimension of the continuous 
tau 
if nonnull a number in (0,1) denoting the quantile for which a quantile
regression spline is to be estimated rather than estimating the
conditional mean (default 
weights 
an optional vector of weights to be used in the fitting process. Should be ‘NULL’ or a numeric vector. If nonNULL, weighted least squares is used with weights ‘weights’ (that is, minimizing ‘sum(w*e^2)’); otherwise ordinary least squares is used. 
singular.ok 
a logical value (default 
Details
frscv
computes exhaustive crossvalidation for a regression
spline estimate of a one (1) dimensional dependent variable on an
r
dimensional vector of continuous and nominal/ordinal
(factor
/ordered
) predictors. The optimal
K
/I
combination (i.e.\
degree
/segments
/I
) is returned along with other
results (see below for return values).
For the continuous predictors the regression spline model employs
either the additive or tensor product Bspline basis matrix for a
multivariate polynomial spline via the Bspline routines in the GNU
Scientific Library (http://www.gnu.org/software/gsl/) and the
tensor.prod.model.matrix
function.
For the nominal/ordinal (factor
/ordered
)
predictors the regression spline model uses indicator basis functions.
Value
frscv
returns a crscv
object. Furthermore, the function
summary
supports objects of this type. The returned
objects have the following components:
K 
scalar/vector containing optimal degree(s) of spline or number of segments 
I 
scalar/vector containing an indicator of whether the
predictor is included or not for each dimension of the
nominal/ordinal ( 
K.mat 
vector/matrix of values of 
cv.func 
objective function value at optimum 
cv.func.vec 
vector of objective function values at each degree
of spline or number of segments in 
Author(s)
Jeffrey S. Racine racinej@mcmaster.ca
References
Craven, P. and G. Wahba (1979), “Smoothing Noisy Data With Spline Functions,” Numerische Mathematik, 13, 377403.
Hurvich, C.M. and J.S. Simonoff and C.L. Tsai (1998), “Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion,” Journal of the Royal Statistical Society B, 60, 271293.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Ma, S. and J.S. Racine and L. Yang (under revision), “Spline Regression in the Presence of Categorical Predictors,” Journal of Applied Econometrics.
Ma, S. and J.S. Racine (2013), “Additive Regression Splines with Irrelevant Categorical and Continuous Regressors,” Statistica Sinica, Volume 23, 515541.
See Also
loess
, npregbw
,
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24  set.seed(42)
## Simulated data
n < 1000
x < runif(n)
z < round(runif(n,min=0.5,max=1.5))
z.unique < uniquecombs(as.matrix(z))
ind < attr(z.unique,"index")
ind.vals < sort(unique(ind))
dgp < numeric(length=n)
for(i in 1:nrow(z.unique)) {
zz < ind == ind.vals[i]
dgp[zz] < z[zz]+cos(2*pi*x[zz])
}
y < dgp + rnorm(n,sd=.1)
xdata < data.frame(x,z=factor(z))
## Compute the optimal K and I, determine optimal number of knots, set
## spline degree for x to 3
cv < frscv(x=xdata,y=y,complexity="knots",degree=c(3))
summary(cv)
