Description Details Author(s) References See Also

Kernel smoothing for data from 1- to 6-dimensions.

There are three main types of functions in this package:

computing kernel estimators - these function names begin with ‘k’

computing bandwidth selectors - these begin with ‘h’ (1-d) or ‘H’ (>1-d)

displaying kernel estimators - these begin with ‘plot’.

The kernel used throughout is the normal (Gaussian) kernel *K*.
For 1-d data, the bandwidth *h* is the standard deviation of
the normal kernel, whereas for multivariate data, the bandwidth matrix
*H* is the variance matrix.

–For kernel density estimation, `kde`

computes

*hat(f)(x) = n^(-1) sum_i K_H (x - X_i).*

The bandwidth matrix *H* is a matrix of smoothing
parameters and its choice is crucial for the performance of kernel
estimators. For display, its `plot`

method calls `plot.kde`

.

–For kernel density estimation, there are several varieties of bandwidth selectors

plug-in

`hpi`

(1-d);`Hpi`

,`Hpi.diag`

(2- to 6-d)least squares (or unbiased) cross validation (LSCV or UCV)

`hlscv`

(1-d);`Hlscv`

,`Hlscv.diag`

(2- to 6-d)biased cross validation (BCV)

`Hbcv`

,`Hbcv.diag`

(2- to 6-d)smoothed cross validation (SCV)

`hscv`

(1-d);`Hscv`

,`Hscv.diag`

(2- to 6-d)normal scale

`hns`

(1-d);`Hns`

(2- to 6-d).

–For kernel density support estimation, the main function is
`ksupp`

which is (the convex hull of)

*{x: hat(f) > tau}*

for a suitable level *tau*. This is closely related to the *tau*-level set of
*hat(f)*.

–For truncated kernel density estimation, the main function is
`kde.truncate`

*hat(f)(x) 1{x in Omega}/int hat(f) 1{x in Omega}*

for a bounded data support *Omega*. The standard density
estimate *hat(f)* is truncated and rescaled to give
unit integral over *Omega*. Its `plot`

method calls `plot.kde`

.

–For boundary kernel density estimation where the kernel function is
modified explicitly in the boundary region, the main function is
`kde.boundary`

*hat(f)(x) = n^(-1) sum_i K*_H (x - X_i)*

for a boundary kernel *K**. Its `plot`

method calls `plot.kde`

.

–For variable kernel density estimation where the bandwidth is not a
constant matrix, the main functions are `kde.balloon`

*hat(f)_ball(x) = n^(-1) sum_i K_H(x) (x - X_i)*

and
`kde.sp`

*hat(f)_SP(x) = n^(-1) sum_i K_H(X_i) (x - X_i).*

For the balloon estimation *hat(f)_ball* the
bandwidth varies with the estimation point *x*, whereas
for the sample point estimation *hat(f)_SP*
the bandwidth varies with the data point
*X_i, i=1, ..., n*.
Their `plot`

methods call `plot.kde`

. The bandwidth
selectors for `kde.balloon`

are based on the normal scale bandwidth
`Hns(,deriv.order=2)`

via the MSE minimal formula, and for
`kde.SP`

on `Hns(,deriv.order=4)`

via the Abramson formula.

–For kernel density derivative estimation, the main function is `kdde`

*hat(f)^(r)(x) = n^(-1) sum_i D^r K_H (x - X_i).*

The bandwidth selectors are a modified subset of those for
`kde`

, i.e. `Hlscv`

, `Hns`

, `Hpi`

, `Hscv`

with `deriv.order>0`

.
Its `plot`

method is `plot.kdde`

for plotting each
partial derivative singly.

–For kernel summary curvature estimation, the main function is
`kcurv`

*hat(s)(x) = -1{D^2 hat(f)(x) <0)*abs(det(D^2 hat(f)(x)))}*

where *D^2 hat(f)(x)* is the kernel Hessian matrix estimate.
It has the same structure as a kernel density estimate so its `plot`

method calls `plot.kde`

.

–For kernel discriminant analysis, the main function is
`kda`

which computes density estimates for each the
groups in the training data, and the discriminant surface.
Its `plot`

method is `plot.kda`

. The wrapper function
`hkda`

, `Hkda`

computes
bandwidths for each group in the training data for `kde`

,
e.g. `hpi`

, `Hpi`

.

–For kernel functional estimation, the main function is
`kfe`

which computes the *r*-th order integrated density functional

*hat(psi)_r = n^(-2) sum_i sum_j D^r K_H (X_i - X_j).*

The plug-in selectors are `hpi.kfe`

(1-d), `Hpi.kfe`

(2- to 6-d).
Kernel functional estimates are usually not required to computed
directly by the user, but only within other functions in the package.

–For kernel-based 2-sample testing, the main function is
`kde.test`

which computes the integrated
*L2* distance between the two density estimates as the test
statistic, comprising a linear combination of 0-th order kernel
functional estimates:

*hat(T) = hat(psi)_0,1 + hat(psi)_0,2 - (hat(psi)_0,12 +
hat(psi)_0,21),*

and the corresponding p-value. The *psi* are
zero order kernel functional estimates with the subscripts indicating
that 1 = sample 1 only, 2 = sample 2 only, and 12, 21 =
samples 1 and 2. The bandwidth selectors are `hpi.kfe`

,
`Hpi.kfe`

with `deriv.order=0`

.

–For kernel-based local 2-sample testing, the main function is
`kde.local.test`

which computes the squared distance
between the two density estimates as the test
statistic

*hat(U)(x) = [hat(f)_1(x) - hat(f)_2(x)]^2*

and the corresponding local
p-values. The bandwidth selectors are those used with `kde`

,
e.g. `hpi, Hpi`

.

–For kernel cumulative distribution function estimation, the main
function is `kcde`

*hat(F)(x) = n^(-1) sum_i intK_H (x - X_i)*

where *intK* is the integrated kernel.
The bandwidth selectors are `hpi.kcde`

,
`Hpi.kcde`

. Its `plot`

method is
`plot.kcde`

.
There exist analogous functions for the survival function *hat(bar(F))*.

–For kernel estimation of a ROC (receiver operating characteristic)
curve to compare two samples from *hat(F)_1, hat(F)_2*, the main function is `kroc`

*{hat(F)_hat(Y1))(z), hat(F_hat(Y2))(z)}*

based on the cumulative distribution functions of
*hat(Yj)=hat(bar(F))_1(X_j), j=1,2*.

The bandwidth selectors are those used with `kcde`

,
e.g. `hpi.kcde, Hpi.kcde`

for
*hat(F)_hat(Yj), hat(bar(F))_1*. Its `plot`

method
is `plot.kroc`

.

–For kernel estimation of a copula, the
main function is `kcopula`

*hat(C)(z) = hat(F)(hat(F)_1^(-1)(z_1),..., hat(F)_d^(-1)(z_d))*

where *hat(F)_j^(-1)(z_j)* is
the *z_j*-th quantile of of the *j*-th marginal
distribution *hat(F_j)*.
The bandwidth selectors are those used with `kcde`

for
*hat(F), hat(F)_j*.
Its `plot`

method is `plot.kcde`

.

–For kernel mean shift clustering, the main function is
`kms`

. The mean shift recurrence relation of the candidate
point *x*

*x_j+1 = x_j + H D hat(f)(x_j)/hat(f)(x_j),*

where *j>=0* and *x_0 = x*,
is iterated until *x* converges to its
local mode in the density estimate *hat(f)* by following
the density gradient ascent paths. This mode determines the cluster
label for *x*. The bandwidth selectors are those used with
`kdde(,deriv.order=1)`

.

–For kernel density ridge estimtation, the main function is
`kdr`

. The kernel density ridge recurrence relation of
the candidate point *x*

*x_j+1 = x_j +
U_(d-1)(x_j) U_(d-1)(x_j)^T H D hat(f)(x_j)/hat(f)(x_j),*

where *j>=0*, *x_0 =
x* and *U_(d-1)* is the 1-dimensional projected
density gradient,
is iterated until *x* converges to the ridge in the
density estimate. The bandwidth selectors are those used with
`kdde(,deriv.order=2)`

.

– For kernel feature significance, the main function
`kfs`

. The hypothesis test at a point *x* is
*H0(x): H f(x) < 0*,
i.e. the density Hessian matrix *H f(x)* is negative definite.
The test statistic is

*
W(x) = ||S(x)^(-1/2) vech H hat{f}(x)||^2*

where *H hat{f}* is the
Hessian estimate, vech is the vector-half operator, and
*S* is an estimate of the null variance.
*W(x)* is
approximately *chi-squared* distributed with
*d(d+1)/2* degrees of freedom.
If *H0(x)* is rejected, then *x*
belongs to a significant modal region.
The bandwidth selectors are those used with
`kdde(,deriv.order=2)`

. Its `plot`

method is
`plot.kfs`

.

–For deconvolution density estimation, the main function is
`kdcde`

. A weighted kernel density
estimation with the contaminated data *W_1, ..., W_n*,

*hat(f)(x) = n^(-1) sum_i alpha_i K_H (x - W_i),*

is utilised, where the weights *alpha_1, ..., alpha_n* are chosen via a
quadratic optimisation involving the error variance and the
regularisation parameter. The bandwidth selectors are those used with
`kde`

.

–Binned kernel estimation is an approximation to the exact kernel estimation and is available for d=1, 2, 3, 4. This makes kernel estimators feasible for large samples.

–For an overview of this package with 2-d density estimation, see
`vignette("kde")`

.

–For ks *>=* 1.11.1, the misc3d and and
rgl (3-d plot), OceanView (quiver plot),
oz (Australian map) packages have been moved from
Depends to Suggests. This was done to allow ks to be installed
on systems where these latter graphical-based packages can't be
installed.

Tarn Duong for most of the package. M.P. Wand for the binned estimation, univariate plug-in selector and univariate density derivative estimator code. J. E. Chacon for the unconstrained pilot functional estimation and fast implementation of derivative-based estimation code. A. and J. Gramacki for the binned estimation for unconstrained bandwidth matrices.

Bowman, A. & Azzalini, A. (1997) *Applied Smoothing Techniques
for Data Analysis*. Oxford University Press, Oxford.

Chacon, J.E. & Duong, T. (2018) *Multivariate Kernel Smoothing
and Its Applications*. Chapman & Hall/CRC. To appear.

Duong, T. (2004) *Bandwidth Matrices for Multivariate Kernel Density
Estimation.* Ph.D. Thesis, University of Western Australia.

Scott, D.W. (1992) *Multivariate Density Estimation: Theory,
Practice, and Visualization*. John Wiley & Sons, New York.

Silverman, B. (1986) *Density Estimation for Statistics and
Data Analysis*. Chapman & Hall/CRC, London.

Simonoff, J. S. (1996) *Smoothing Methods in Statistics*.
Springer-Verlag, New York.

Wand, M.P. & Jones, M.C. (1995) *Kernel Smoothing*. Chapman &
Hall/CRC, London.

feature, sm, KernSmooth

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.