Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples
Density, cumulative distribution function, quantile function and
random number generation for the kernel density estimation using the kernel
specified by kernel, with a constant bandwidth specified by either
lambda or bw.
1 2 3 4 5 6 7 8 9 10 11 | dkden(x, kerncentres, lambda = NULL, bw = NULL, kernel = "gaussian",
log = FALSE)
pkden(q, kerncentres, lambda = NULL, bw = NULL, kernel = "gaussian",
lower.tail = TRUE)
qkden(p, kerncentres, lambda = NULL, bw = NULL, kernel = "gaussian",
lower.tail = TRUE)
rkden(n = 1, kerncentres, lambda = NULL, bw = NULL,
kernel = "gaussian")
|
x |
quantiles |
kerncentres |
kernel centres (typically sample data vector or scalar) |
lambda |
bandwidth for kernel (as half-width of kernel) or |
bw |
bandwidth for kernel (as standard deviations of kernel) or |
kernel |
kernel name ( |
log |
logical, if TRUE then log density |
q |
quantiles |
lower.tail |
logical, if FALSE then upper tail probabilities |
p |
cumulative probabilities |
n |
sample size (positive integer) |
Kernel density estimation using one of many possible kernels with a constant bandwidth.
The alternate bandwidth definitions are discussed in the
kernels, with the lambda as the default.
The bw specification is the same as used in the
density function.
The possible kernels are also defined in kernels help
documentation with the "gaussian" as the default choice.
The density function dkden produces exactly the
same density estimate as density when a sequence
of x values are provided, see examples. The latter function is far
more efficient in this situation as it takes advantage of the computational
savings from doing the kernel smoothing in the spectral domain (using the FFT),
where the convolution becomes a multiplication. So even after accounting for applying
the (Fast) Fourier Transform (FFT) and its inverse it is much more efficient
especially for a large sample size or large number of evaluation points.
However, this KDE function applies the less efficient convolution using the standard definition:
\hat{f}_(x) = \frac{1}{n} ∑_{j=1}^{n} K(\frac{x - x_j}{λ})
where K(.) is the density function for the standard
kernel. Thus are no restriction on the values x can take. For example, in the
"gaussian" kernel case for a particular x the density is evaluated as
mean(dnorm(x, kerncentres, lambda)) for the density and
mean(pnorm(x, kerncentres, lambda)) for cumulative distribution
function which is slower than the FFT but is more adaptable.
An inversion sampler is used for random number generation which also rather inefficient, as it can be carried out more efficiently using a mixture representation.
The quantile function is rather complicated as there is no closed form solution,
so is obtained by numerical approximation of the inverse cumulative distribution function
P(X ≤ q) = p to find q. The quantile function
qkden evaluates the KDE cumulative distribution
function over the range from c(max(kerncentre) - lambda, max(kerncentre) + lambda),
or c(max(kerncentre) - 5*lambda, max(kerncentre) + 5*lambda) for normal kernel.
Outside of this range the quantiles are set to -Inf for lower tail and Inf
for upper tail. A sequence of values
of length fifty times the number of kernels (with minimum of 1000) is first
calculated. Spline based interpolation using splinefun,
with default monoH.FC method, is then used to approximate the quantile
function. This is a similar approach to that taken
by Matt Wand in the qkde in the ks package.
If no bandwidth is provided lambda=NULL and bw=NULL then the normal
reference rule is used, using the bw.nrd0 function, which is
consistent with the density function. At least two kernel
centres must be provided as the variance needs to be estimated.
dkden gives the density,
pkden gives the cumulative distribution function,
qkden gives the quantile function and
rkden gives a random sample.
Based on code by Anna MacDonald produced for MATLAB.
Unlike most of the other extreme value mixture model functions the
kden functions have not been vectorised as
this is not appropriate. The main inputs (x, p or q)
must be either a scalar or a vector, which also define the output length.
The kernel centres kerncentres can either be a single datapoint or a vector
of data. The kernel centres (kerncentres) and locations to evaluate density (x)
and cumulative distribution function (q) would usually be different.
Default values are provided for all inputs, except for the fundamentals
kerncentres, x, q and p. The default sample size for
rkden is 1.
Missing (NA) and Not-a-Number (NaN) values in x,
p and q are passed through as is and infinite values are set to
NA. None of these are not permitted for the parameters.
Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.
Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz.
http://en.wikipedia.org/wiki/Kernel_density_estimation
http://en.wikipedia.org/wiki/Cross-validation_(statistics)
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84(5), 1-27. doi: 10.18637/jss.v084.i05.
Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360.
Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179.
MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.
Wand, M. and Jones, M.C. (1995). Kernel Smoothing. Chapman && Hall.
kernels, kfun,
density, bw.nrd0
and dkde in ks package.
Other kden: bckden, fbckden,
fgkgcon, fgkg,
fkdengpdcon, fkdengpd,
fkden, kdengpdcon,
kdengpd
Other kdengpd: bckdengpd,
fbckdengpd, fgkg,
fkdengpdcon, fkdengpd,
fkden, gkg,
kdengpdcon, kdengpd
Other gkg: fgkgcon, fgkg,
fkdengpd, gkgcon,
gkg, kdengpd
Other bckden: bckdengpdcon,
bckdengpd, bckden,
fbckdengpdcon, fbckdengpd,
fbckden, fkden
Other bckdengpd: bckdengpdcon,
bckdengpd, bckden,
fbckdengpdcon, fbckdengpd,
fbckden, fkdengpd,
gkg, kdengpd
Other fkden: fkden
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ## Not run:
set.seed(1)
par(mfrow = c(2, 2))
nk=50
x = rnorm(nk)
xx = seq(-5, 5, 0.01)
plot(xx, dnorm(xx))
rug(x)
for (i in 1:nk) lines(xx, dnorm(xx, x[i], sd = bw.nrd0(x))*0.05)
lines(xx, dkden(xx, x), lwd = 2, col = "red")
lines(density(x), lty = 2, lwd = 2, col = "green")
legend("topright", c("True Density", "KDE Using evmix", "KDE Using density function"),
lty = c(1, 1, 2), lwd = c(1, 2, 2), col = c("black", "red", "green"))
# Estimate bandwidth using cross-validation likelihood
x = rnorm(nk)
fit = fkden(x)
hist(x, nk/5, freq = FALSE, xlim = c(-5, 5), ylim = c(0, 0.6))
rug(x)
for (i in 1:nk) lines(xx, dnorm(xx, x[i], sd = fit$bw)*0.05)
lines(xx,dnorm(xx), col = "black")
lines(xx, dkden(xx, x, lambda = fit$lambda), lwd = 2, col = "red")
lines(density(x), lty = 2, lwd = 2, col = "green")
lines(density(x, bw = fit$bw), lwd = 2, lty = 2, col = "blue")
legend("topright", c("True Density", "KDE fitted evmix",
"KDE Using density, default bandwidth", "KDE Using density, c-v likelihood bandwidth"),
lty = c(1, 1, 2, 2), lwd = c(1, 2, 2, 2), col = c("black", "red", "green", "blue"))
plot(xx, pnorm(xx), type = "l")
rug(x)
lines(xx, pkden(xx, x), lwd = 2, col = "red")
lines(xx, pkden(xx, x, lambda = fit$lambda), lwd = 2, col = "green")
# green and blue (quantile) function should be same
p = seq(0, 1, 0.001)
lines(qkden(p, x, lambda = fit$lambda), p, lwd = 2, lty = 2, col = "blue")
legend("topleft", c("True Density", "KDE using evmix, normal reference rule",
"KDE using evmix, c-v likelihood","KDE quantile function, c-v likelihood"),
lty = c(1, 1, 1, 2), lwd = c(1, 2, 2, 2), col = c("black", "red", "green", "blue"))
xnew = rkden(10000, x, lambda = fit$lambda)
hist(xnew, breaks = 100, freq = FALSE, xlim = c(-5, 5))
rug(xnew)
lines(xx,dnorm(xx), col = "black")
lines(xx, dkden(xx, x), lwd = 2, col = "red")
legend("topright", c("True Density", "KDE Using evmix"),
lty = c(1, 2), lwd = c(1, 2), col = c("black", "red"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.