kepdf: Kernel estimate of a probability density function. In pdfCluster: Cluster Analysis via Nonparametric Density Estimation

Description

Estimates density of uni- and multivariate data by the kernel method.

Usage

 1 2 kepdf(x, eval.points = x, kernel = "gaussian", bwtype = "fixed", h = h.norm(x), hx = NULL, alpha = 1/2) 

Arguments

 x A vector, a matrix or data-frame of data whose density should be estimated. eval.points A vector, a matrix or a data-frame of data points at which the density estimate should be evaluated. kernel Either 'gaussian' or 't7', it defines the kernel function to be used. See details below. bwtype Either 'fixed' or 'adaptive', corresponding to a kernel estimator with fixed or adaptive bandwidths respectively. See details below. h A vector of length set to NCOL(x), defining the smoothing parameters to be used either to estimate the density in kernel estimation with fixed bandwidth or to estimate the pilot density in kernel estimation with adaptive bandwidths. Default value is the result of h.norm applied to x. hx A matrix with the same number of rows and columns as x, where each row defines the vector of smoothing parameters specific for each sample point. To be used when bwtype = "adaptive". Default value is the result of hprop2f applied to x. Set to NULL when bwtype= "fixed". alpha Sensitivity parameter to be given to hprop2f when bwtype= "adaptive" and the vectors of smoothing parameters are computed according to Silverman's (1986) approach.

Details

The current version of pdfCluster-package allows for computing estimates by a kernel product estimator of the form:

\hat{f}(y)= ∑_{i=1}^n \frac{1}{n h_{i,1} \cdots h_{i,d}} ∏_{j=1}^d K≤ft(\frac{y_{j} - x_{i,j}}{h_{i,j}}\right).

The kernel function K can either be a Gaussian density (if kernel = "gaussian") or a t_ν density, with ν = 7 degrees of freedom (when kernel = "t7"). Although uncommon, the option of selecting a t kernel is motivated by computational efficiency reasons. Hence, its use is suggested when either x or eval.points have a huge number of rows.

The vectors of bandwidths h_{i} = (h_{i,1} \cdots h_{i,d})' are defined as follows:

Fixed bandwidth

When bwtype='fixed', h_{i} = h that is, a constant smoothing vector is used for all the observations x_i. Default values are set as asymptotically optimal for a multivariate Normal distribution (e.g., Bowman and Azzalini, 1997). See h.norm for further details.

When bwtype='adaptive', a vector of bandwidths h_i is specified for each observation x_i. Default values are selected according to Silverman (1986, Section 5.3.1). See hprop2f.

Value

An S4 object of kepdf-class with slots:

 call  The matched call. x  The data input, coerced to be a matrix. eval.points  The data points at which the density is evaluated. estimate  The values of the density estimate at the evaluation points. kernel  The selected kernel. bwtype  The type of estimator. par  A list of parameters used to estimate the density, with elements: h the smoothing parameters used to estimate either the density or the pilot density; hx the matrix of sample smoothing parameters, when bwtype='adaptive'; alpha sensitivity parameter used if bwtype='adaptive'.

References

Bowman, A.W. and Azzalini, A. (1997). Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford University Press, Oxford.

Silverman, B. (1986). Density estimation for statistics and data analysis. Chapman and Hall, London.

h.norm, hprop2f, kepdf-class.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ## A 1-dimensional example data(wine) x <- wine[,3] pdf <- kepdf(x, eval.points=seq(0,7,by=.1)) plot(pdf, n.grid= 100, main="wine data") ## A 2-dimensional example x <- wine[,c(2,8)] pdf <- kepdf(x) plot(pdf, main="wine data", props=c(5,50,90), ylim=c(0,4)) plot(pdf, main="wine data", method="perspective", phi=30, theta=60) ### A 3-dimensional example x <- wine[,c(2,3,8)] pdf <- kepdf(x) plot(pdf, main="wine data", props=c(10,50,70), gap=0.2) plot(pdf, main="wine data", method="perspective", gap=0.2, phi=30, theta=10) ### A 6-dimensional example ### adaptive kernel density estimate is preferable in high-dimensions x <- wine[,c(2,3,5,7,8,10)] pdf <- kepdf(x, bwtype="adaptive") plot(pdf, main="wine data", props=c(10,50,70), gap=0.2) plot(pdf, main="wine data", method="perspective", gap=0.2, phi=30, theta=10)