| crs | R Documentation |
crs computes a regression spline estimate of a
one (1) dimensional dependent variable on an r-dimensional
vector of continuous and categorical
(factor/ordered) predictors (Ma and
Racine (2013), Ma, Racine and Yang (2015)).
crs(...)
## Default S3 method:
crs(xz,
y,
basis = c("auto","additive","tensor","glp"),
complexity = c("degree-knots","degree","knots"),
data.return = FALSE,
degree = NULL,
deriv = 0,
display.nomad.progress = TRUE,
display.warnings = TRUE,
include = NULL,
kernel = TRUE,
knots = c("quantiles","uniform","auto"),
lambda = NULL,
model.return = FALSE,
prune = FALSE,
segments = NULL,
tau = NULL,
weights = NULL,
...)
## S3 method for class 'formula'
crs(formula,
basis = c("auto","additive","tensor","glp"),
complexity = c("degree-knots","degree","knots"),
cv = c("nomad","exhaustive","none"),
cv.df.min = 1,
cv.func = c("cv.ls","cv.gcv","cv.aic"),
cv.threshold = 1000,
data = list(),
data.return = FALSE,
degree = NULL,
degree.max = 10,
degree.min = 0,
deriv = 0,
display.nomad.progress = TRUE,
display.warnings = TRUE,
include = NULL,
initial.mesh.size.integer = "1",
initial.mesh.size.real = "r1.0e-01",
kernel = TRUE,
knots = c("quantiles","uniform","auto"),
lambda = NULL,
lambda.discrete = FALSE,
lambda.discrete.num = 100,
max.bb.eval = NULL,
max.eval = NULL,
min.mesh.size.integer = 1,
min.mesh.size.real = paste(sqrt(.Machine$double.eps)),
min.frame.size.integer = 1,
min.frame.size.real = 1,
model.return = FALSE,
nmulti = 2,
opts=list(),
prune = FALSE,
random.seed = 42,
restarts = 0,
segments = NULL,
segments.max = 10,
segments.min = 1,
singular.ok = FALSE,
tau = NULL,
weights = NULL,
...)
These arguments identify the model formula/data interface and explicit data inputs.
data |
an optional data frame containing the variables in the model |
formula |
a symbolic description of the model to be fit |
xz |
numeric ( |
y |
a numeric vector of responses. |
These arguments control basis type, spline complexity, factor inclusion, and optional kernel smoothing.
basis |
a character string (default |
complexity |
a character string (default
For the continuous predictors the regression spline model employs
either the additive or tensor product B-spline basis matrix for a
multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (https://www.gnu.org/software/gsl/) and the
|
degree |
integer/vector specifying the polynomial degree of the
B-spline basis for each dimension of the continuous |
degree.max |
the maximum degree of the B-spline basis for
each of the continuous predictors (default |
degree.min |
the minimum degree of the B-spline basis for
each of the continuous predictors (default |
include |
integer/vector specifying whether each of the
nominal/ordinal ( |
kernel |
a logical value (default |
knots |
a character string (default |
lambda |
a vector of bandwidths for each dimension of the
categorical |
lambda.discrete |
if |
lambda.discrete.num |
a positive integer indicating the number of
discrete values that lambda can assume - this parameter will only be
used when |
segments |
integer/vector specifying the number of segments of the
B-spline basis for each dimension of the continuous |
segments.max |
the maximum segments of the B-spline basis for
each of the continuous predictors (default |
segments.min |
the minimum segments of the B-spline basis for
each of the continuous predictors (default |
These arguments control cross-validation objective selection and restart behavior.
cv |
a character string (default |
cv.df.min |
the minimum degrees of freedom to allow when
conducting NOMAD-based cross-validation (default
|
cv.func |
a character string (default |
cv.threshold |
an integer (default |
nmulti |
integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default |
prune |
a logical value (default |
random.seed |
when it is not missing and not equal to 0, the
initial points will be generated using this seed when using
|
restarts |
integer specifying the number of times to restart the process of finding extrema of the cross-validation function (for the bandwidths only) from different (random) initial points |
singular.ok |
a logical value (default |
These arguments control NOMAD mesh settings and optional solver controls.
initial.mesh.size.integer |
argument passed to the NOMAD solver (see |
initial.mesh.size.real |
argument passed to the NOMAD solver (see |
max.bb.eval |
argument passed to the NOMAD solver. The default |
max.eval |
optional NOMAD total point-lookup budget. This is distinct from
|
min.frame.size.integer |
arguments passed to the NOMAD solver (see |
min.frame.size.real |
arguments passed to the NOMAD solver (see |
min.mesh.size.integer |
arguments passed to the NOMAD solver (see |
min.mesh.size.real |
argument passed to the NOMAD solver (see |
opts |
list of optional arguments to be passed to
|
These arguments control derivative extraction, quantile level, and observation weights.
deriv |
an integer |
tau |
if non-null a number in (0,1) denoting the quantile for which a quantile
regression spline is to be estimated rather than estimating the
conditional mean (default |
weights |
an optional vector of weights to be used in the fitting process. Should be ‘NULL’ or a numeric vector. If non-NULL, weighted least squares is used with weights ‘weights’ (that is, minimizing ‘sum(w*e^2)’); otherwise ordinary least squares is used. |
These arguments control whether fitted model state is returned.
data.return |
a logical value indicating whether to return
|
model.return |
a logical value indicating whether to return the
list of |
These arguments control warnings and displayed optimizer progress.
display.nomad.progress |
a logical value indicating whether to
display the progress of the NOMAD solver (default |
display.warnings |
a logical value indicating whether to
display warnings (default |
Further optional arguments are passed through to lower-level routines.
... |
optional arguments |
Typical usages are (see below for a list of options and also the examples at the end of this help file)
## Estimate the model and let the basis type be determined by
## cross-validation (i.e. cross-validation will determine whether to
## use the additive, generalized, or tensor product basis)
model <- crs(y~x1+x2)
## Estimate the model for a specified degree/segment/bandwidth
## combination and do not run cross-validation (will use the
## additive basis by default)
model <- crs(y~x1+factor(x2),cv="none",degree=3,segments=1,lambda=.1)
## Plot the mean and (asymptotic) error bounds
plot(model,errors = "asymptotic")
## Plot the first partial derivative and (asymptotic) error bounds
plot(model,gradients = TRUE,errors = "asymptotic")
crs computes a regression spline estimate of a one (1)
dimensional dependent variable on an r-dimensional vector of
continuous and categorical
(factor/ordered) predictors.
The regression spline model employs the tensor product B-spline basis
matrix for a multivariate polynomial spline via the B-spline routines
in the GNU Scientific Library (https://www.gnu.org/software/gsl/)
and the tensor.prod.model.matrix function.
When basis="additive" the model becomes additive in nature
(i.e. no interaction/tensor terms thus semiparametric not fully
nonparametric).
When basis="tensor" the model uses the multivariate tensor
product basis.
When kernel=FALSE the model uses indicator basis functions for
the nominal/ordinal (factor/ordered)
predictors rather than kernel weighting.
When kernel=TRUE the product kernel function for the discrete
predictors is of the ‘Li-Racine’ type (see Li and Racine (2007)
for details).
When cv="nomad", numerical search is undertaken using Nonsmooth
Optimization by Mesh Adaptive Direct Search (Abramson, Audet, Couture,
Dennis, Jr., and Le Digabel (2011)).
When kernel=TRUE and cv="exhaustive", numerical search
is undertaken using optim and the box-constrained
L-BFGS-B method (see optim for details). The user
may restart the algorithm as many times as desired via the
restarts argument (default restarts=0). The approach
ascends from degree=0 (or segments=0) through
degree.max and for each value of degree (or
segments) searches for the optimal bandwidths. After the most
complex model has been searched then the optimal
degree/segments/lambda combination is
selected. If any element of the optimal degree (or
segments) vector coincides with degree.max (or segments.max) a warning
is produced and the user ought to restart their search with a larger
value of degree.max (or segments.max).
Note that the default plot method for a crs object
displays the fitted conditional mean or conditional quantile surface.
One-dimensional partial fitted curves are drawn by default; for two
continuous predictors, perspective=TRUE requests a
two-dimensional surface and renderer="rgl" requests the rgl
renderer. Surface displays use transparent viridis coloring, NP-style
rotation controls via view, and optional data overlays/rugs via
data_overlay and data_rug. Gradient displays are requested with
gradients=TRUE; higher-order derivatives use
gradient_order=i. Intervals are controlled by the modern
np-style errors, band, alpha,
bootstrap, and B arguments.
Note that setting prune=TRUE produces a final ‘pruning’
of the model via a stepwise cross-validation criterion achieved by
modifying stepAIC and replacing extractAIC
with extractCV throughout the function. This option may be
enabled to remove potentially superfluous bases thereby improving the
finite-sample efficiency of the resulting model. Note that if the
cross-validation score for the pruned model is no better than that for
the original model then the original model is returned with a warning
to this effect. Note also that this option can only be used when
kernel=FALSE.
crs returns a crs object. The generic functions
fitted and residuals extract (or
generate) estimated values and residuals. Furthermore, the functions
summary, predict, and plot
support objects of this type. The plot.crs method follows the
modern np plotting interface, including
errors=c("none","asymptotic","bootstrap"),
gradients=TRUE, gradient_order,
output=c("plot","data","plot-data","both"),
perspective, renderer, view, neval,
band, alpha, bootstrap, B, and the
control helpers documented in plot.crs.
predict.crs() honors explicit deriv= requests for new
evaluation data and, when newdata is omitted, computes explicit
derivative requests at the training data; ordinary predict(object)
calls retain the stored fitted-value behavior. The
returned object has the following components:
fitted.values |
estimates of the regression function (conditional mean) at the sample points or evaluation points |
lwr, upr |
lower/upper bound for a 95% confidence interval for
the |
residuals |
residuals computed at the sample points or evaluation points |
degree |
integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous |
include |
integer/vector specifying whether each of the
nominal/ordinal ( |
kernel |
a logical value indicating whether kernel smoothing was
used ( |
lambda |
vector of bandwidths used if |
nomad.summary |
summary of NOMAD blackbox evaluations, cache activity, and effective NOMAD options, present only when NOMAD search was used |
call |
a symbolic description of the model |
r.squared |
coefficient of determination (Doksum and Samarov (1995)) |
model.lm |
an object of ‘ |
deriv.mat |
a matrix of derivatives (or differences in levels
for the categorical |
deriv.mat.lwr |
a matrix of 95% coverage lower bounds for
|
deriv.mat.upr |
a matrix of 95% coverage upper bounds for
|
hatvalues |
the |
P.hat |
the kernel probability estimates corresponding to the categorical predictors in the estimated model |
Note that when kernel=FALSE summary supports the
option sigtest=TRUE that conducts an F-test for significance
for each predictor.
Jeffrey S. Racine racinej@mcmaster.ca
Abramson, M.A. and C. Audet and G. Couture and J.E. Dennis Jr. and and S. Le Digabel (2011), “The NOMAD project”. Software available at https://www.gerad.ca/nomad.
Craven, P. and G. Wahba (1979), “Smoothing Noisy Data With Spline Functions,” Numerische Mathematik, 13, 377-403.
Doksum, K. and A. Samarov (1995), “Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression,” The Annals of Statistics, 23 1443-1473.
Hurvich, C.M. and J.S. Simonoff and C.L. Tsai (1998), “Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion,” Journal of the Royal Statistical Society B, 60, 271-293.
Le Digabel, S. (2011), “Algorithm 909: NOMAD: Nonlinear Optimization With The MADS Algorithm”. ACM Transactions on Mathematical Software, 37(4):44:1-44:15.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Ma, S. and J.S. Racine and L. Yang (2015), “Spline Regression in the Presence of Categorical Predictors,” Journal of Applied Econometrics, Volume 30, 705-717.
Ma, S. and J.S. Racine (2013), “Additive Regression Splines with Irrelevant Categorical and Continuous Regressors,” Statistica Sinica, Volume 23, 515-541.
Racine, J.S. (2011), “Cross-Validated Quantile Regression Splines,” manuscript.
smooth.spline, loess, npreg
set.seed(42)
## Example - simulated data
n <- 1000
num.eval <- 50
x1 <- runif(n)
x2 <- runif(n)
z <- rbinom(n,1,.5)
dgp <- cos(2*pi*x1)+sin(2*pi*x2)+z
z <- factor(z)
y <- dgp + rnorm(n,sd=.5)
## Estimate a model with specified degree, segments, and bandwidth
model <- crs(y~x1+x2+z,degree=c(5,5),
segments=c(1,1),
lambda=0.1,
cv="none",
kernel=TRUE)
summary(model)
## Perspective plot
x1.seq <- seq(min(x1),max(x1),length=num.eval)
x2.seq <- seq(min(x2),max(x2),length=num.eval)
x.grid <- expand.grid(x1.seq,x2.seq)
newdata <- data.frame(x1=x.grid[,1],x2=x.grid[,2],
z=factor(rep(0,num.eval**2),levels=c(0,1)))
z0 <- matrix(predict(model,newdata=newdata),num.eval,num.eval)
newdata <- data.frame(x1=x.grid[,1],x2=x.grid[,2],
z=factor(rep(1,num.eval**2),levels=c(0,1)))
z1 <- matrix(predict(model,newdata=newdata),num.eval,num.eval)
zlim=c(min(z0,z1),max(z0,z1))
persp(x=x1.seq,y=x2.seq,z=z0,
xlab="x1",ylab="x2",zlab="y",zlim=zlim,
col=grDevices::adjustcolor("red",alpha.f=0.35),
border=grDevices::adjustcolor("red",alpha.f=0.60),
ticktype="detailed",
theta=45,phi=45)
par(new=TRUE)
persp(x=x1.seq,y=x2.seq,z=z1,
xlab="x1",ylab="x2",zlab="y",zlim=zlim,
col=grDevices::adjustcolor("blue",alpha.f=0.35),
border=grDevices::adjustcolor("blue",alpha.f=0.60),
theta=45,phi=45,
ticktype="detailed")
## Partial regression surface plot
plot(model,errors = "asymptotic")
## NP-style data overlay and support rug
plot(model,data_rug=TRUE)
## For two continuous predictors, surface intervals are available through the
## same errors interface
## plot(crs(y~x1+x2,cv="none"),perspective=TRUE,errors="bootstrap",B=99)
## plot(crs(y~x1+x2,cv="none"),perspective=TRUE,renderer="rgl",data_rug=TRUE)
## Not run:
## A plot example where we extract the partial surfaces, confidence
## intervals etc. automatically generated by plot(...) but do
## not plot, rather save for separate use.
pdat <- plot(model,errors = "asymptotic",output ="data")
## Column 1 is the (evaluation) predictor ([,1]), 2-4 ([,-1]) the mean,
## lwr, and upr (note the returned value is a 'list' hence pdat[[1]] is
## data for the first predictor, pdat[[2]] the second etc). Note that
## matplot() can plot this nicely.
matplot(pdat[[1]][,1],pdat[[1]][,-1],
xlab=names(pdat[[1]][1]),ylab=names(pdat[[1]][2]),
lty=c(1,2,2),col=c(1,2,2),type="l")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.