library(knitr) opts_chunk$set(collapse = TRUE, fig.align = "center")
The cpr package provides several tools for finding parsimonious B-spline regression models via control polygon reduction. This vignette is an overview of the tools provided by the package.
This vignette is not complete. The details will be filled in soon. As for now, this is a placeholder for many topics to come. A Journal of Statistical Software paper is under development. That paper will become the foundation for this vignette.
A brief overview of the Control Polygon Reduction and Control Net Reduction method for finding parsimonious B-spline regression models are presented first. The later sections of this vignette provide extended detail on the additional tools provided by the cpr package.
# Needed packages to run the examples in this vignette library(cpr) library(splines)
Coming Soon A full description of the CPR method, including the background on B-splines and assessing knot influence will be provided soon.
The quick overview: for a uni-variable B-spline regression model $$\boldsymbol{y}= f(\boldsymbol{x}) + \boldsymbol{Z}{f} \boldsymbol{\beta} + \boldsymbol{Z}{r} \boldsymbol{b} + \epsilon$$ where $\boldsymbol{y}$ is a function of one variable $x$ with other (optional) fixed effects $\boldsymbol{Z}{f} \beta$ and (optional) random effects $\boldsymbol{Z}{r} \boldsymbol{b},$ we are interested in modeling the function $f\left( \boldsymbol{x} \right)$ with B-splines.
The B-spline function is $$ f \left(\boldsymbol{x}\right) \approx \boldsymbol{B}_{k, \boldsymbol{\xi}} \left( \boldsymbol{x} \right) \boldsymbol{\theta}$$ were $\boldsymbol{B}$ is the basis matrix defined by the polynomial order $k$ and knot sequence $\boldsymbol{\xi},$ and the regression coefficients $\boldsymbol{\theta}.$
The Control Polygon Reduction (CPR) approach to finding parsimonious regression models with a good quality of fit is to start with a high cardinal knot sequence, assess the relative influence of each knot on the spline function, remove the least influential knot, refit the regression model, reassess the relative influence of each knot, and so forth until all knots knots have been removed. The selection of a regression model is made by looking at the regression models of sequentially larger knot sequences until it is determined that the use of an additional knot does not provide any meaningful improvements to the regression model.
There are only four steps that the end user needs to take to use the CPR algorithm within the cpr package.
cp
.cpr
.Here is an example.
# Construct the initial control polygon initial_cp <- cp(log10(pdg) ~ bsplines(day, df = 54), data = spdg) # Run CPR cpr_run <- cpr(initial_cp)
There are two types of diagnostic plots, 1) sequential control polygons, and 2) root mean squared error (RMSE) by model index.
In the plot below, there is no major differences in the shape of the control polygon between model index 4, 5, and 6. As such, we would conclude that model index 4 is sufficient for modeling the data.
# sequential control polygons plot(cpr_run, color = TRUE)
Further, in the plot below, we see that there is very little improvement in the RMSE from model index 4 to 5, and beyond. As with the plot above, we conclude that model index 4 is sufficient for modeling the data.
# RMSE by model index plot(cpr_run, type = 'rmse', to = 10)
Extract the model. To save memory when running CPR, the default is to omit the
regression model objects from the control polygons with in the cpr_run
object.
To get the regression model back, you can extract the knot locations and build
the model yourself, or update the selected_cp
object to retain the fit.
selected_cp <- cpr_run[[4]] selected_cp$iknots
selected_cp <- update(selected_cp, keep_fit = TRUE) selected_fit <- selected_cp$fit coef_matrix <- summary(selected_fit)$coef dimnames(coef_matrix)[[1]] <- paste("Vertex", 1:7) coef_matrix
If you wanted to run this analysis with a better modeling approach, e.g., using
mixed effect models handle the multiple observations within a subject, you can
specify the regression approach via the method
argument in the cp
call.
library(lme4) initial_lmer_cp <- cp(log10(pdg) ~ bsplines(day, df = 54) + (1 | id), data = spdg, method = lmer)
Coming Soon Control Net reduction is the natural extension of CPR from uni-variable functions to multi-variable functions. Details on this work will be added to this vignette soon.
The CPR and CNR algorithms rely on B-splines, control polygons, tensor products of B-splines, and control nets. The following sections provide additional detail on the tools within the cpr package.
Base R includes the splines package and the splines::bs
call for building
B-splines. The cpr package provides the cpr::bsplines
call as there are
certain default behaviors and additional required meta-data storage requirements
needed for the CPR method that the splines::bs
call does not provide. In this
section we compare splines::bs
and cpr::bsplines
so that end users can
translate between the two functions.
| | splines::bs
| cpr::bsplines
|
|-----------|----------------------------------|---------------------|
| Arguments | | |
| | x
| x
|
| | df
| df
|
| | knots
| iknots
|
| | degree = 3
| order = 4L
|
| | Boundary.knots = range(x)
| bknots = range(x)
|
| | intercept = FALSE
| -- |
|Attributes | | |
| | dim
| dim
|
| | degree
| order
|
| | knots
| iknots
|
| | Boundary.knots
| bknots
|
| | intercept
| -- |
| | -- | xi
|
| | -- | xi_star
|
| | class
| class
|
A major difference between the two functions is related to the intercept
argument of bs
. By default, bs
will omit the first column of the basis
whereas bsplines
will return the whole basis. The omission of the first
column of the basis generated by bs
allows for additive bs
calls to be used
on the right-hand-side of a regression formula and generate a full rank design
matrix. If additive bsplines
calls, or additive bs
with intercept = TRUE
,
are on the right-hand-side of the regression equation the resulting design
matrix will be rank deficient. This is a result of the B-splines being a
partition of unity. As the CPR algorithm is based on having the whole basis,
the bsplines
function is provided to make it easy to work with the whole basis
without having to remember to use non-default settings in bs
.
Both functions use x
, a numeric vector, as the first argument. The degrees of
freedom, df
, argument is slightly different between the two functions; the
return object from bs
depends on the values of df
and intercept
whereas
bsplines
always returns the full basis and thus df
will be equal the number
of columns of the returned basis.
bs
uses the polynomial degree
whereas bsplines
uses the
polynomial order
(order = degree + 1) to define the splines. The default
for both bs
and bsplines
is to generate cubic B-splines.
Specifying the internal knots and boundary knots are the same between the two
functions save the name of the arguments. For both bs
and bsplines
only the degrees of freedom or the internal knots need to be specified. If the
end user specifies both, the specified knots take precedence. If only the
degrees of freedom are specified then bs
will generate internal knots via
a call equivalent to
stats::quantile(x, probs = seq(0, 1, length = length(knots) + 2L)[-c(1, length(knots) + 2L)].
The default behavior for bsplines
is nearly the same, only that a call to
`trimmed_quantile`` is made.
bmat <- bsplines(x = seq(0, 6, length = 500), iknots = c(1.0, 1.5, 2.3, 4.0, 4.5)) plot(bmat)
cp
build_tensor
and of B-splines btensor
cn
print(sessionInfo(), local = FALSE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.