kepdf: Kernel estimate of a probability density function.
In pdfCluster: Cluster Analysis via Nonparametric Density Estimation

kepdf

R Documentation

Kernel estimate of a probability density function.

Description

Estimates density of uni- and multivariate data by the kernel method.

Usage

kepdf(x, eval.points = x, kernel = "gaussian", 
      bwtype = "fixed", h = h.norm(x), hx = NULL, alpha = 1/2)

Arguments

`x`	A vector, a matrix or data-frame of data whose density should be estimated.
`eval.points`	A vector, a matrix or a data-frame of data points at which the density estimate should be evaluated.
`kernel`	Either 'gaussian' or 't7', it defines the kernel function to be used. See details below.
`bwtype`	Either 'fixed' or 'adaptive', corresponding to a kernel estimator with fixed or adaptive bandwidths respectively. See details below.
`h`	A vector of length set to `NCOL(x)`, defining the smoothing parameters to be used either to estimate the density in kernel estimation with fixed bandwidth or to estimate the pilot density in kernel estimation with adaptive bandwidths. Default value is the result of `h.norm` applied to `x`.
`hx`	A matrix with the same number of rows and columns as `x`, where each row defines the vector of smoothing parameters specific for each sample point. To be used when `bwtype = "adaptive"`. Default value is the result of `hprop2f` applied to `x`. Set to NULL when `bwtype= "fixed"`.
`alpha`	Sensitivity parameter to be given to `hprop2f` when `bwtype= "adaptive"` and the vectors of smoothing parameters are computed according to Silverman's (1986) approach.

Details

The current version of pdfCluster-package allows for computing estimates by a kernel product estimator of the form:

\hat{f}(y)= ∑_{i=1}^n \frac{1}{n h_{i,1} \cdots h_{i,d}} ∏_{j=1}^d K≤ft(\frac{y_{j} - x_{i,j}}{h_{i,j}}\right).

The kernel function K can either be a Gaussian density (if kernel = "gaussian") or a t_ν density, with ν = 7 degrees of freedom (when kernel = "t7"). Although uncommon, the option of selecting a t kernel is motivated by computational efficiency reasons. Hence, its use is suggested when either x or eval.points have a huge number of rows.

The vectors of bandwidths h_{i} = (h_{i,1} \cdots h_{i,d})' are defined as follows:

Fixed bandwidth: When bwtype='fixed', h_{i} = h that is, a constant smoothing vector is used for all the observations x_i. Default values are set as asymptotically optimal for a multivariate Normal distribution (e.g., Bowman and Azzalini, 1997). See h.norm for further details.
Adaptive bandwidth: When bwtype='adaptive', a vector of bandwidths h_i is specified for each observation x_i. Default values are selected according to Silverman (1986, Section 5.3.1). See hprop2f.

Value

An S4 object of kepdf-class with slots:

`call`	The matched call.
`x`	The data input, coerced to be a matrix.
`eval.points`	The data points at which the density is evaluated.
`estimate`	The values of the density estimate at the evaluation points.
`kernel`	The selected kernel.
`bwtype`	The type of estimator.
`par`	A list of parameters used to estimate the density, with elements: `h` the smoothing parameters used to estimate either the density or the pilot density; `hx` the matrix of sample smoothing parameters, when `bwtype='adaptive'`; `alpha` sensitivity parameter used if `bwtype='adaptive'`.

References

Bowman, A.W. and Azzalini, A. (1997). Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford University Press, Oxford.

Silverman, B. (1986). Density estimation for statistics and data analysis. Chapman and Hall, London.

Examples

## A 1-dimensional example
data(wine)
x <- wine[,3] 
pdf <- kepdf(x, eval.points=seq(0,7,by=.1))
plot(pdf, n.grid= 100, main="wine data")

## A 2-dimensional example
x <- wine[,c(2,8)] 
pdf <- kepdf(x)
plot(pdf, main="wine data", props=c(5,50,90), ylim=c(0,4))
plot(pdf, main="wine data", method="perspective", phi=30, theta=60)

### A 3-dimensional example
x <- wine[,c(2,3,8)] 
pdf <- kepdf(x)
plot(pdf, main="wine data", props=c(10,50,70), gap=0.2)
plot(pdf, main="wine data", method="perspective", gap=0.2, phi=30, theta=10)

### A 6-dimensional example
### adaptive kernel density estimate is preferable in high-dimensions
x <- wine[,c(2,3,5,7,8,10)]
pdf <- kepdf(x, bwtype="adaptive")
plot(pdf, main="wine data", props=c(10,50,70), gap=0.2)
plot(pdf, main="wine data", method="perspective", gap=0.2, phi=30, theta=10)

pdfCluster documentation built on Dec. 2, 2022, 5:14 p.m.