Kernel smoothing for data from 1- to 6-dimensions.

There are three main types of functions in this package:

computing kernel estimators - these function names begin with ‘k’

computing bandwidth selectors - these begin with ‘h’ (1-d) or ‘H’ (>1-d)

displaying kernel estimators - these begin with ‘plot’.

The kernel used throughout is the normal (Gaussian) kernel *K*.
For 1-d data, the bandwidth *h* is the standard deviation of
the normal kernel, whereas for multivariate data, the bandwidth matrix
*H* is the variance matrix.

–For kernel density estimation, `kde`

computes

*hat(f)(x) = n^(-1) sum_i K_H (x - X_i).*

The bandwidth matrix *H* is a matrix of smoothing
parameters and its choice is crucial for the performance of kernel
estimators. For display, its `plot`

method calls `plot.kde`

.

–For kernel density estimators, there are several varieties of bandwidth selectors

plug-in

`hpi`

(1-d);`Hpi`

,`Hpi.diag`

(2- to 6-d)least squares (or unbiased) cross validation (LSCV or UCV)

`hlscv`

(1-d);`Hlscv`

,`Hlscv.diag`

(2- to 6-d)biased cross validation (BCV)

`Hbcv`

,`Hbcv.diag`

(2- to 6-d)smoothed cross validation (SCV)

`hscv`

(1-d);`Hscv`

,`Hscv.diag`

(2- to 6-d)normal scale

`hns`

(1-d);`Hns`

(2- to 6-d).

–For kernel density derivative estimation, the main function is `kdde`

*hat(f)^(r)(x) = n^(-1) sum_i D^r K_H (x - X_i).*

The bandwidth selectors are a modified subset of those for
`kde`

, i.e. `Hlscv`

, `Hns`

, `Hpi`

, `Hscv`

with `deriv.order>0`

.
Its `plot`

method is `plot.kdde`

for plotting each
partial derivative singly.

–For kernel discriminant analysis, the main function is
`kda`

which computes density estimates for each the
groups in the training data, and the discriminant surface.
Its `plot`

method is `plot.kda`

. The wrapper function
`hkda`

, `Hkda`

computes
bandwidths for each group in the training data for `kde`

,
e.g. `hpi`

, `Hpi`

.

–For kernel functional estimation, the main function is
`kfe`

which computes the *r*-th order integrated density functional

*hat(psi)_r = n^(-2) sum_i sum_j D^r K_H (X_i - X_j).*

The plug-in selectors are `hpi.kfe`

(1-d), `Hpi.kfe`

(2- to 6-d).
Kernel functional estimates are usually not required to computed
directly by the user, but only within other functions in the package.

–For kernel-based 2-sample testing, the main function is
`kde.test`

which computes the integrated
*L2* distance between the two density estimates as the test
statistic, comprising a linear combination of 0-th order kernel
functional estimates:

*hat(T) = hat(psi)_0,1 + hat(psi)_0,2 - (hat(psi)_0,12 +
hat(psi)_0,21),*

and the corresponding p-value. The *psi* are
zero order kernel functional estimates with the subscripts indicating
that 1 = sample 1 only, 2 = sample 2 only, and 12, 21 =
samples 1 and 2. The bandwidth selectors are `hpi.kfe`

,
`Hpi.kfe`

with `deriv.order=0`

.

–For kernel-based local 2-sample testing, the main function is
`kde.local.test`

which computes the squared distance
between the two density estimates as the test
statistic

*hat(U)(x) =
[hat(f)_1(x) - hat(f)_2(x)]^2*

and the corresponding local
p-values. The bandwidth selectors are those used with `kde`

,
e.g. `hpi, Hpi`

.

–For kernel cumulative distribution function estimation, the main
function is `kcde`

*hat(F)(x) =
n^(-1) sum_i intK_H (x - X_i)*

where *intK* is the integrated kernel.
The bandwidth selectors are `hpi.kcde`

,
`Hpi.kcde`

. Its `plot`

method is
`plot.kcde`

.
There exist analogous functions for the survival function *hat(bar(F))*.

–For kernel estimation of a ROC (receiver operating characteristic)
curve to compare two samples from *hat(F)_1, hat(F)_2*, the main function is `kroc`

*(hat(F)_hat(Y1))(z), hat(F_hat(Y2))(z))*

based on the cumulative distribution functions of
*hat(Yj)=hat(bar(F))_1(X_j), j=1,2*.

The bandwidth selectors are those used with `kcde`

,
e.g. `hpi.kcde, Hpi.kcde`

for
*hat(F)_hat(Yj), hat(bar(F))_1*. Its `plot`

method
is `plot.kroc`

.

–For kernel estimation of a copula, the
main function is `kcopula`

*hat(C)(z) = hat(F)(hat(F)_1^(-1)(z_1),..., hat(F)_d^(-1)(z_d))*

where *hat(F)_j^(-1)(z_j)* is
the *z_j*-th quantile of of the *j*-th marginal
distribution *hat(F_j)*.
The bandwidth selectors are those used with `kcde`

for
*hat(F), hat(F)_j*.
Its `plot`

method is `plot.kcde`

.

–For kernel estimation of a copula density, the
main function is `kcopula.de`

*hat(c)(z) =
hat(f)(z) = n^(-1) sum_i K_H (z - hat(Z)_i)*

where *hat(Z)_i = (hat(F)_1(X_i1), …, hat(F)_d(X_id))*.
The bandwidth selectors are those used with `kde`

for
*hat(c)* and `kcde`

for *hat(F)_j*.
Its `plot`

method is `plot.kde`

.

–Binned kernel estimation is available for d = 1, 2, 3, 4. This makes kernel estimators feasible for large samples.

–For an overview of this package with 2-d density estimation, see
`vignette("kde")`

.

Tarn Duong for most of the package. M.P. Wand for the binned estimation, univariate plug-in selector and univariate density derivative estimator code. Jose E. Chacon for the unconstrained pilot functional estimation and fast implementation of derivative-based estimation code. Artur and Jaroslaw Gramacki for the binned estimation for unconstrained bandwidth matrices.

Bowman, A. & Azzalini, A. (1997) *Applied Smoothing Techniques
for Data Analysis*. Oxford University Press, Oxford.

Duong, T. (2004) *Bandwidth Matrices for Multivariate Kernel Density
Estimation.* Ph.D. Thesis, University of Western Australia.

Scott, D.W. (1992) *Multivariate Density Estimation: Theory,
Practice, and Visualization*. John Wiley & Sons, New York.

Silverman, B. (1986) *Density Estimation for Statistics and
Data Analysis*. Chapman & Hall/CRC, London.

Simonoff, J. S. (1996) *Smoothing Methods in Statistics*.
Springer-Verlag. New York.

Wand, M.P. & Jones, M.C. (1995) *Kernel Smoothing*. Chapman &
Hall/CRC, London.

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.