R package for univariate kernel density estimation with parametric
starts and asymmetric kernels.
kdensity is an implementation of univariate kernel density estimation
with support for parametric starts and asymmetric kernels. Its main
kdensity, which is has approximately the same syntax as
stats::density. Its new functionality is:
kdensityhas built-in support for many parametric starts, such as
gamma, but you can also supply your own.
gammakernels, but also the common symmetric ones. In addition, you can also supply your own kernels.
bw, again including an option to specify your own.
A reason to use
kdensity is to avoid boundary bias when estimating
densities on the unit interval or the positive half-line. Asymmetric
kernels such as
gcopula are designed for this purpose. The
support for parametric starts allows you to easily use a method that is
often superior to ordinary kernel density estimation.
R packages deal with kernel estimation. For an overview see
Deng & Hadley Wickham
(2011). While no
R package handles density estimation with parametric starts,
several packages supports methods that handle boundary bias.
a variety of boundary bias correction methods in the
kde1d corrects for boundary bias
using transformed univariate local polynomial kernel density estimation.
logKDE corrects for
boundary bias on the half line using a logarithmic transform.
ks supports boundary correction
kde.boundary function, while
for boundary bias using tailored kernel functions.
R, use one of the following commands:
# For the CRAN release install.packages("kdensity") # For the development version from GitHub: # install.packages("devtools") devtools::install_github("JonasMoss/kdensity")
library function and use it just like
with optional additional arguments.
library("kdensity") plot(kdensity(mtcars$mpg, start = "normal"))
Kernel density estimation with a parametric start was introduced by Hjort and Glad in Nonparametric Density Estimation with a Parametric Start (1995). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. The resulting estimator will outperform the ordinary kernel density estimator in terms of asymptotic integrated mean squared error whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off.
In addition to parametric starts, the package implements some asymmetric kernels. These kernels are useful when modelling data with sharp boundaries, such as data supported on the positive half-line or the unit interval. Currently we support the following asymmetric kernels:
Jones and Henderson’s Gaussian copula KDE, from Kernel-Type
Density Estimation on the Unit Interval
This is used for data on the unit interval. The bandwidth selection
mechanism described in that paper is implemented as well. This
kernel is called
Chen’s two beta kernels from Beta kernel estimators for density
These are used for data supported on the on the unit interval, and
Chen’s two gamma kernels from Probability Density Function
Estimation Using Gamma Kernels
These are used for data supported on the positive half-line, and are
These features can be combined to make asymmetric kernel densities
estimators with parametric starts, see the example below. The package
contains only one function,
kdensity, in addition to the generics
kdensity takes some
data, a kernel
kernel and a
start. You can optionally specify the
parameter, which is used to find the normalizing constant.
The following example uses the data set. The black curve is a
gamma-kernel density estimate with a gamma start, the red curve a fully
parametric gamma density and and the blue curve an ordinary
estimate. Notice the boundary bias of the ordinary
The underlying parameter estimates are always maximum likelilood.
library("kdensity") kde = kdensity(airquality$Wind, start = "gamma", kernel = "gamma") plot(kde, main = "Wind speed (mph)") lines(kde, plot_start = TRUE, col = "red") lines(density(airquality$Wind, adjust = 2), col = "blue") rug(airquality$Wind)
Since the return value of
kdensity is a function,
kde is callable
and can be used as any density function in
R (such as
For example, you can do:
kde(10) #>  0.09980471 integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1. #> 1.27532e-05 with absolute error < 2.2e-19
You can access the parameter estimates by using
coef. You can also
access the log likelihood (
logLik), AIC and BIC of the parametric
coef(kde) #> shape rate #> 7.1872898 0.7217954 logLik(kde) #> 'log Lik.' 12.33787 (df=2) AIC(kde) #>  -20.67574
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.