clsd | R Documentation |
clsd
computes the logspline density, density
derivative, distribution, and smoothed quantiles for a one (1)
dimensional continuous variable using the approach of Racine
(2013).
clsd(x = NULL, beta = NULL, xeval = NULL, degree = NULL, segments = NULL, degree.min = 2, degree.max = 25, segments.min = 1, segments.max = 100, lbound = NULL, ubound = NULL, basis = "tensor", knots = "quantiles", penalty = NULL, deriv.index = 1, deriv = 1, elastic.max = TRUE, elastic.diff = 3, do.gradient = TRUE, er = NULL, monotone = TRUE, monotone.lb = -250, n.integrate = 500, nmulti = 1, method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN"), verbose = FALSE, quantile.seq = seq(.01,.99,by=.01), random.seed = 42, maxit = 10^5, max.attempts = 25, NOMAD = FALSE)
x |
a numeric vector of training data |
beta |
a numeric vector of coefficients (default |
xeval |
a numeric vector of evaluation data |
degree |
integer/vector specifying the polynomial degree of the
B-spline basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of the
B-spline basis for each dimension of the continuous |
segments.min,segments.max |
when |
degree.min,degree.max |
when |
lbound,ubound |
lower/upper bound for the support of the density. For example, if there is a priori knowledge that the density equals zero to the left of 0, and has a discontinuity at 0, the user could specify lbound = 0. However, if the density is essentially zero near 0, one does not need to specify lbound |
basis |
a character string (default |
knots |
a character string (default |
deriv |
an integer |
deriv.index |
an integer |
nmulti |
integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default |
penalty |
the parameter to be used in the AIC criterion. The
method chooses the number of degrees plus number of segments
(knots-1) that maximizes |
elastic.max,elastic.diff |
a logical value/integer indicating
whether to use ‘elastic’ search bounds such that the optimal
degree/segment must lie |
do.gradient |
a logical value indicating whether or not to use
the analytical gradient during optimization (defaults to |
er |
a scalar indicating the fraction of data range to extend
the tails (default |
monotone |
a logical value indicating whether modify
the standard B-spline basis function so that it is tailored for
density estimation (default |
monotone.lb |
a negative bound specifying the lower bound on
the linear segment coefficients used when ( |
n.integrate |
the number of evenly spaced integration points on the extended range specified by |
method |
see |
verbose |
a logical value which when |
quantile.seq |
a sequence of numbers lying in [0,1] on which quantiles from the logspline distribution are obtained |
random.seed |
seeds the random number generator for initial
parameter values when |
maxit |
maximum number of iterations used by |
max.attempts |
maximum number of attempts to undertake if |
NOMAD |
a logical value which when |
Typical usages are (see below for a list of options and also the examples at the end of this help file)
model <- clsd(x)
clsd
computes a logspline density estimate of a one (1)
dimensional continuous variable.
The spline model employs the tensor product B-spline basis matrix for
a multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (https://www.gnu.org/software/gsl/) and the
tensor.prod.model.matrix
function.
When basis="additive"
the model becomes additive in nature
(i.e. no interaction/tensor terms thus semiparametric not fully
nonparametric).
When basis="tensor"
the model uses the multivariate tensor
product basis.
clsd
returns a clsd
object. The generic functions
coef
, fitted
, plot
and
summary
support objects of this type (er=FALSE
plots the density on the sample realizations (default is ‘extended
range’ data), see er
above, distribution=TRUE
plots
the distribution). The returned object has the following components:
density |
estimates of the density function at the sample points |
density.er |
the density evaluated on the ‘extended range’ of the data |
density.deriv |
estimates of the derivative of the density function at the sample points |
density.deriv.er |
estimates of the derivative of the density function evaluated on the ‘extended range’ of the data |
distribution |
estimates of the distribution function at the sample points |
distribution.er |
the distribution evaluated on the ‘extended range’ of the data |
xer |
the ‘extended range’ of the data |
degree |
integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous |
xq |
vector of quantiles |
tau |
vector generated by |
This function should be considered to be in ‘beta’ status until further notice.
If smoother estimates are desired and degree=degree.min
, increase
degree.min
to, say, degree.min=3
.
The use of ‘regression’ B-splines can lead to undesirable behavior at
the endpoints of the data (i.e. when monotone=FALSE
). The
default ‘density’ B-splines ought to be well-behaved in these regions.
Jeffrey S. Racine racinej@mcmaster.ca
Racine, J.S. (2013), “Logspline Mixed Data Density Estimation,” manuscript.
logspline
## Not run: ## Old Faithful eruptions data histogram and clsd density library(MASS) data(faithful) attach(faithful) model <- clsd(eruptions) ylim <- c(0,max(model$density,hist(eruptions,breaks=20,plot=FALSE)$density)) plot(model,ylim=ylim) hist(eruptions,breaks=20,freq=FALSE,add=TRUE,lty=2) rug(eruptions) summary(model) coef(model) ## Simulated data set.seed(42) require(logspline) ## Example - simulated data n <- 250 x <- sort(rnorm(n)) f.dgp <- dnorm(x) model <- clsd(x) ## Standard (cubic) estimate taken from the logspline package ## Compute MSEs mse.clsd <- mean((fitted(model)-f.dgp)^2) model.logspline <- logspline(x) mse.logspline <- mean((dlogspline(x,model.logspline)-f.dgp)^2) ylim <- c(0,max(fitted(model),dlogspline(x,model.logspline),f.dgp)) plot(model, ylim=ylim, sub=paste("MSE: logspline = ",format(mse.logspline),", clsd = ", format(mse.clsd)), lty=3, col=3) xer <- model$xer lines(xer,dlogspline(xer,model.logspline),col=2,lty=2) lines(xer,dnorm(xer),col=1,lty=1) rug(x) legend("topright",c("DGP", paste("Cubic Logspline Density (package 'logspline', knots = ", model.logspline$nknots,")",sep=""), paste("clsd Density (degree = ", model$degree, ", segments = ", model$segments,", penalty = ",round(model$penalty,2),")",sep="")), lty=1:3, col=1:3, bty="n", cex=0.75) summary(model) coef(model) ## Simulate data with known bounds set.seed(42) n <- 10000 x <- runif(n,0,1) model <- clsd(x,lbound=0,ubound=1) plot(model) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.