# Categorical Logspline Density

### Description

`clsd`

computes the logspline density, density
derivative, distribution, and smoothed quantiles for a one (1)
dimensional continuous variable using the approach of Racine
(2013).

### Usage

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ```
clsd(x = NULL,
beta = NULL,
xeval = NULL,
degree = NULL,
segments = NULL,
degree.min = 2,
degree.max = 25,
segments.min = 1,
segments.max = 100,
lbound = NULL,
ubound = NULL,
basis = "tensor",
knots = "quantiles",
penalty = NULL,
deriv.index = 1,
deriv = 1,
elastic.max = TRUE,
elastic.diff = 3,
do.gradient = TRUE,
er = NULL,
monotone = TRUE,
monotone.lb = -250,
n.integrate = 500,
nmulti = 1,
method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN"),
verbose = FALSE,
quantile.seq = seq(.01,.99,by=.01),
random.seed = 42,
maxit = 10^5,
max.attempts = 25,
NOMAD = FALSE)
``` |

### Arguments

`x` |
a numeric vector of training data |

`beta` |
a numeric vector of coefficients (default |

`xeval` |
a numeric vector of evaluation data |

`degree` |
integer/vector specifying the polynomial degree of the
B-spline basis for each dimension of the continuous |

`segments` |
integer/vector specifying the number of segments of the
B-spline basis for each dimension of the continuous |

`segments.min,segments.max` |
when |

`degree.min,degree.max` |
when |

`lbound,ubound` |
lower/upper bound for the support of the density. For example, if there is a priori knowledge that the density equals zero to the left of 0, and has a discontinuity at 0, the user could specify lbound = 0. However, if the density is essentially zero near 0, one does not need to specify lbound |

`basis` |
a character string (default |

`knots` |
a character string (default |

`deriv` |
an integer |

`deriv.index` |
an integer |

`nmulti` |
integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default |

`penalty` |
the parameter to be used in the AIC criterion. The
method chooses the number of degrees plus number of segments
(knots-1) that maximizes |

`elastic.max,elastic.diff` |
a logical value/integer indicating
whether to use ‘elastic’ search bounds such that the optimal
degree/segment must lie |

`do.gradient` |
a logical value indicating whether or not to use
the analytical gradient during optimization (defaults to |

`er` |
a scalar indicating the fraction of data range to extend
the tails (default |

`monotone` |
a logical value indicating whether modify
the standard B-spline basis function so that it is tailored for
density estimation (default |

`monotone.lb` |
a negative bound specifying the lower bound on
the linear segment coefficients used when ( |

`n.integrate` |
the number of evenly spaced integration points on the extended range specified by |

`method` |
see |

`verbose` |
a logical value which when |

`quantile.seq` |
a sequence of numbers lying in |

`random.seed` |
seeds the random number generator for initial
parameter values when |

`maxit` |
maximum number of iterations used by |

`max.attempts` |
maximum number of attempts to undertake if |

`NOMAD` |
a logical value which when |

### Details

Typical usages are (see below for a list of options and also the examples at the end of this help file)

1 2 3 4 | ```
model <- clsd(x)
``` |

`clsd`

computes a logspline density estimate of a one (1)
dimensional continuous variable.

The spline model employs the tensor product B-spline basis matrix for
a multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (http://www.gnu.org/software/gsl/) and the
`tensor.prod.model.matrix`

function.

When `basis="additive"`

the model becomes additive in nature
(i.e. no interaction/tensor terms thus semiparametric not fully
nonparametric).

When `basis="tensor"`

the model uses the multivariate tensor
product basis.

### Value

`clsd`

returns a `clsd`

object. The generic functions
`coef`

, `fitted`

, `plot`

and
`summary`

support objects of this type (`er=FALSE`

plots the density on the sample realizations (default is 'extended
range' data), see `er`

above, `distribution=TRUE`

plots
the distribution). The returned object has the following components:

`density` |
estimates of the density function at the sample points |

`density.er` |
the density evaluated on the ‘extended range’ of the data |

`density.deriv` |
estimates of the derivative of the density function at the sample points |

`density.deriv.er` |
estimates of the derivative of the density function evaluated on the ‘extended range’ of the data |

`distribution` |
estimates of the distribution function at the sample points |

`distribution.er` |
the distribution evaluated on the ‘extended range’ of the data |

`xer` |
the ‘extended range’ of the data |

`degree` |
integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous |

`segments` |
integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous |

`xq` |
vector of quantiles |

`tau` |
vector generated by |

### Usage Issues

This function should be considered to be in ‘beta’ status until further notice.

If smoother estimates are desired and `degree=degree.min`

, increase
`degree.min`

to, say, `degree.min=3`

.

The use of ‘regression’ B-splines can lead to undesirable behavior at
the endpoints of the data (i.e. when `monotone=FALSE`

). The
default ‘density’ B-splines ought to be well-behaved in these regions.

### Author(s)

Jeffrey S. Racine racinej@mcmaster.ca

### References

Racine, J.S. (2013), “Logspline Mixed Data Density Estimation,” manuscript.

### See Also

`logspline`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | ```
## Not run:
## Old Faithful eruptions data histogram and clsd density
library(MASS)
data(faithful)
attach(faithful)
model <- clsd(eruptions)
ylim <- c(0,max(model$density,hist(eruptions,breaks=20,plot=FALSE)$density))
plot(model,ylim=ylim)
hist(eruptions,breaks=20,freq=FALSE,add=TRUE,lty=2)
rug(eruptions)
summary(model)
coef(model)
## Simulated data
set.seed(42)
require(logspline)
## Example - simulated data
n <- 250
x <- sort(rnorm(n))
f.dgp <- dnorm(x)
model <- clsd(x)
## Standard (cubic) estimate taken from the logspline package
## Compute MSEs
mse.clsd <- mean((fitted(model)-f.dgp)^2)
model.logspline <- logspline(x)
mse.logspline <- mean((dlogspline(x,model.logspline)-f.dgp)^2)
ylim <- c(0,max(fitted(model),dlogspline(x,model.logspline),f.dgp))
plot(model,
ylim=ylim,
sub=paste("MSE: logspline = ",format(mse.logspline),", clsd = ",
format(mse.clsd)),
lty=3,
col=3)
xer <- model$xer
lines(xer,dlogspline(xer,model.logspline),col=2,lty=2)
lines(xer,dnorm(xer),col=1,lty=1)
rug(x)
legend("topright",c("DGP",
paste("Cubic Logspline Density (package `logspline', knots = ",
model.logspline$nknots,")",sep=""),
paste("clsd Density (degree = ", model$degree, ", segments = ",
model$segments,", penalty = ",round(model$penalty,2),")",sep="")),
lty=1:3,
col=1:3,
bty="n",
cex=0.75)
summary(model)
coef(model)
## Simulate data with known bounds
set.seed(42)
n <- 10000
x <- runif(n,0,1)
model <- clsd(x,lbound=0,ubound=1)
plot(model)
## End(Not run)
``` |