Description Details Main Features Requirements Licence Author(s) References See Also
Smoothing techniques and computing bandwidth selectors of the r'th derivative of a probability density for one-dimensional data.
Package: | kedd |
Type: | Package |
Version: | 1.0.3 |
Date: | 2015-10-30 |
License: | GPL (>= 2) |
There are four main types of functions in this package:
Compute the derivatives and convolutions of a kernel function (1-d).
Compute the kernel estimators for density and its derivatives (1-d).
Computing the bandwidth selectors (1-d).
Displaying kernel estimators.
Convolutions and derivatives in kernel function:
In non-parametric statistics, a kernel is a weighting function used in non-parametric estimation techniques.
The kernels functions K(x) are used in derivatives of kernel density estimator to estimate
hat(f)(x;r), satisfying the following three requirements:
int K(x) dx = 1
int x K(x) dx = 0
mu(K(x)) = int x^2 K(x) dx < inf
Several types of kernel functions K(x) are commonly used in this package: Gaussian, Epanechnikov, Uniform (rectangular), Triangular,
Triweight, Tricube, Biweight (quartic), Cosine.
The function kernel.fun
for kernel derivative K(x;r) and kernel.conv
for
kernel convolution K(x;r) * K(x;r), where the write formally:
K(x;r) = d^r/d x^r (K(x))
K(x;r) * K(x;r) = int K(y;r) K(x-y;r) dy
for r = 0, 1, 2, …
Estimators of r'th derivative of a density function:
A natural estimator of the r'th derivative of a density function f(x) is:
hat(f)(x;r) = n^-1 h^-(r+1) Sum ( K(x-X(i)/h ;r) ,i = 1...n)
Here, X(1), X(2),...,X(n) is an i.i.d, sample of size n from the distribution with density
f(x), K(x) is the kernel function which we take to be a symmetric probability density with
at least r non zero derivatives when estimating f(x;r), and h is the bandwidth,
this parameter is very important that controls the degree of smoothing applied to the data.
The case (r=0) is the standard kernel density estimator (e.g. Silverman 1986, Wolfgang 1991, Scott 1992,
Wand and Jones 1995, Jeffrey 1996, Bowman and Azzalini 1997, Alexandre 2009), properties of such derivative
estimators are well known e.g. Sheather and Jones (1991), Jones and Kappenman (1991), Wolfgang (1991). For
the case (r > 0), is derivative of kernel density estimator (e.g. Bhattacharya 1967, Schuster 1969, Alekseev 1972,
Wolfgang et all 1990, Jones 1992, Stoker 1993) and for applications which require the estimation of density derivatives can
be found in Singh (1977).
For r'th derivatives of kernel density estimator one-dimensional, the main function is dkde
. For display,
its plot method calls plot.dkde
, and if to add a plot using lines.dkde
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | R> data(trimodal)
R> dkde(x = trimodal, deriv.order = 0, kernel = "gaussian")
Data: trimodal (200 obs.); Kernel: gaussian
Derivative order: 0; Bandwidth 'h' = 0.1007
eval.points est.fx
Min. :-2.91274 Min. :0.0000066
1st Qu.:-1.46519 1st Qu.:0.0669750
Median :-0.01765 Median :0.1682045
Mean :-0.01765 Mean :0.1723692
3rd Qu.: 1.42989 3rd Qu.:0.2484626
Max. : 2.87743 Max. :0.4157340
R> dkde(x = trimodal, deriv.order = 1, kernel = "gaussian")
Data: trimodal (200 obs.); Kernel: gaussian
Derivative order: 1; Bandwidth 'h' = 0.09094
eval.points est.fx
Min. :-2.87358 Min. :-1.740447
1st Qu.:-1.44562 1st Qu.:-0.343952
Median :-0.01765 Median : 0.009057
Mean :-0.01765 Mean : 0.000000
3rd Qu.: 1.41031 3rd Qu.: 0.415343
Max. : 2.83828 Max. : 1.256891
|
Bandwidth selectors:
The most important factor in the r'th derivative kernel density estimate is a choice of the bandwidth
h for one-dimensional observations. Because of its role in controlling both the amount and
the direction of smoothing, this choice is particularly important. We present the popular bandwidth
selection (for more details see references) methods in this package:
Optimal Bandwidth (AMISE); with deriv.order >= 0
, name of this function is h.amise
.
For display, its plot method calls plot.h.amise
, and to add a plot used lines.h.amise
.
Maximum-likelihood cross-validation (MLCV); with deriv.order = 0
, name of this function is h.mlcv
.
For display, its plot method calls plot.h.mlcv
, and to add a plot used lines.h.mlcv
.
Unbiased cross validation (UCV); with deriv.order >= 0
, name of this function is h.ucv
.
For display, its plot method calls plot.h.ucv
, and to add a plot used lines.h.ucv
.
Biased cross validation (BCV); with deriv.order >= 0
, name of this function is h.bcv
.
For display, its plot method calls plot.h.bcv
, and to add a plot used lines.h.bcv
.
Complete cross-validation (CCV); with deriv.order >= 0
, name of this function is h.ccv
.
For display, its plot method calls plot.h.ccv
, and to add a plot used lines.h.ccv
.
Modified cross-validation (MCV); with deriv.order >= 0
, name of this function is h.mcv
.
For display, its plot method calls plot.h.mcv
, and to add a plot used lines.h.mcv
.
Trimmed cross-validation (TCV); with deriv.order >= 0
, name of this function is h.tcv
.
For display, its plot method calls plot.h.tcv
, and to add a plot used lines.h.tcv
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | R> data(trimodal)
R> h.bcv(x = trimodal, whichbcv = 1, deriv.order = 0, kernel = "gaussian")
Call: Biased Cross-Validation 1
Derivative order = 0
Data: trimodal (200 obs.); Kernel: gaussian
Min BCV = 0.004511636; Bandwidth 'h' = 0.4357812
R> h.ccv(x = trimodal, deriv.order = 1, kernel = "gaussian")
Call: Complete Cross-Validation
Derivative order = 1
Data: trimodal (200 obs.); Kernel: gaussian
Min CCV = 0.01985078; Bandwidth 'h' = 0.5828336
R> h.tcv(x = trimodal, deriv.order = 2, kernel = "gaussian")
Call: Trimmed Cross-Validation
Derivative order = 2
Data: trimodal (200 obs.); Kernel: gaussian
Min TCV = -295.563; Bandwidth 'h' = 0.08908582
R> h.ucv(x = trimodal, deriv.order = 3, kernel = "gaussian")
Call: Unbiased Cross-Validation
Derivative order = 3
Data: trimodal (200 obs.); Kernel: gaussian
Min UCV = -63165.18; Bandwidth 'h' = 0.1067236
|
For an overview of this package, see vignette("kedd")
.
R version >= 2.15.0
This package and its documentation are usable under the terms of the "GNU General Public License", a copy of which is distributed with the package.
Arsalane Chouaib Guidoum acguidoum@usthb.dz (Dept. Probability and Statistics, USTHB, Algeria).
Please send comments, error reports, etc. to the author via the addresses email mentioned above.
Alekseev, V. G. (1972). Estimation of a probability density function and its derivatives. Mathematical notes of the Academy of Sciences of the USSR. 12(5), 808–811.
Alexandre, B. T. (2009). Introduction to Nonparametric Estimation. Springer-Verlag, New York.
Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of kernel density estimates. Biometrika, 71, 353–360.
Bowman, A. W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: the Kernel Approach with S-Plus Illustrations. Oxford University Press, Oxford.
Bowman, A.W. and Azzalini, A. (2003). Computational aspects of nonparametric smoothing with illustrations from the sm library. Computational Statistics and Data Analysis, 42, 545–560.
Bowman, A.W. and Azzalini, A. (2013). sm: Smoothing methods for nonparametric regression and density estimation. R package version 2.2-5.3. Ported to R by B. D. Ripley.
Bhattacharya, P. K. (1967). Estimation of a probability density function and Its derivatives. Sankhya: The Indian Journal of Statistics, Series A, 29, 373–382.
Duin, R. P. W. (1976). On the choice of smoothing parameters of Parzen estimators of probability density functions. IEEE Transactions on Computers, C-25, 1175–1179.
Feluch, W. and Koronacki, J. (1992). A note on modified cross-validation in density estimation. Computational Statistics and Data Analysis, 13, 143–151.
George, R. T. (1990). The maximal smoothing principle in density estimation. Journal of the American Statistical Association, 85, 470–477.
George, R. T. and Scott, D. W. (1985). Oversmoothed nonparametric density estimates. Journal of the American Statistical Association, 80, 209–214.
Habbema, J. D. F., Hermans, J., and Van den Broek, K. (1974) A stepwise discrimination analysis program using density estimation. Compstat 1974: Proceedings in Computational Statistics. Physica Verlag, Vienna.
Heidenreich, N. B., Schindler, A. and Sperlich, S. (2013). Bandwidth selection for kernel density estimation: a review of fully automatic selectors. Advances in Statistical Analysis.
Jeffrey, S. S. (1996). Smoothing Methods in Statistics. Springer-Verlag, New York.
Jones, M. C. (1992). Differences and derivatives in kernel estimation. Metrika, 39, 335–340.
Jones, M. C., Marron, J. S. and Sheather,S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91, 401–407.
Jones, M. C. and Kappenman, R. F. (1991). On a class of kernel density estimate bandwidth selectors. Scandinavian Journal of Statistics, 19, 337–349.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
Olver, F. W., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (2010). NIST Handbook of Mathematical Functions. Cambridge University Press, New York, USA.
Peter, H. and Marron, J.S. (1987). Estimation of integrated squared density derivatives. Statistics and Probability Letters, 6, 109–115.
Peter, H. and Marron, J.S. (1991). Local minima in cross-validation functions. Journal of the Royal Statistical Society, Series B, 53, 245–252.
Radhey, S. S. (1987). MISE of kernel estimates of a density and its derivatives. Statistics and Probability Letters, 5, 153–159.
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78.
Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
Scott, D.W. and George, R. T. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.
Schuster, E. F. (1969) Estimation of a probability density function and its derivatives. The Annals of Mathematical Statistics, 40 (4), 1187–1195.
Sheather, S. J. (2004). Density estimation. Statistical Science, 19, 588–597.
Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683–690.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC. London.
Singh, R. S. (1977). Applications of estimators of a density and its derivatives to certain statistical problems. Journal of the Royal Statistical Society, Series B, 39(3), 357–363.
Stoker, T. M. (1993). Smoothing bias in density derivative estimation. Journal of the American Statistical Association, 88, 855–863.
Stute, W. (1992). Modified cross validation in density estimation. Journal of Statistical Planning and Inference, 30, 293–305.
Tarn, D. (2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7), 1–16.
Tristen, H. and Jeffrey, S. R. (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software,27(5).
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer.
Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London.
Wand, M.P. and Ripley, B. D. (2013). KernSmooth: Functions for Kernel Smoothing for Wand and Jones (1995). R package version 2.23-10.
Wolfgang, H. (1991). Smoothing Techniques, With Implementation in S. Springer-Verlag, New York.
Wolfgang, H., Marlene, M., Stefan, S. and Axel, W. (2004). Nonparametric and Semiparametric Models. Springer-Verlag, Berlin Heidelberg.
Wolfgang, H., Marron, J. S. and Wand, M. P. (1990). Bandwidth choice for density derivatives. Journal of the Royal Statistical Society, Series B, 223–232.
ks, KernSmooth, sm, np, locfit, feature, GenKern.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.