Description Usage Arguments Details Value Boundary Correction Methods Warning Acknowledgments Note Author(s) References See Also Examples
Density, cumulative distribution function, quantile function and
random number generation for boundary corrected kernel density estimators
using a variety of approaches (and different kernels) with a constant
bandwidth lambda
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | dbckden(x, kerncentres, lambda = NULL, bw = NULL,
kernel = "gaussian", bcmethod = "simple", proper = TRUE,
nn = "jf96", offset = NULL, xmax = NULL, log = FALSE)
pbckden(q, kerncentres, lambda = NULL, bw = NULL,
kernel = "gaussian", bcmethod = "simple", proper = TRUE,
nn = "jf96", offset = NULL, xmax = NULL, lower.tail = TRUE)
qbckden(p, kerncentres, lambda = NULL, bw = NULL,
kernel = "gaussian", bcmethod = "simple", proper = TRUE,
nn = "jf96", offset = NULL, xmax = NULL, lower.tail = TRUE)
rbckden(n = 1, kerncentres, lambda = NULL, bw = NULL,
kernel = "gaussian", bcmethod = "simple", proper = TRUE,
nn = "jf96", offset = NULL, xmax = NULL)
|
x |
quantiles |
kerncentres |
kernel centres (typically sample data vector or scalar) |
lambda |
bandwidth for kernel (as half-width of kernel) or |
bw |
bandwidth for kernel (as standard deviations of kernel) or |
kernel |
kernel name ( |
bcmethod |
boundary correction method |
proper |
logical, whether density is renormalised to integrate to unity (where needed) |
nn |
non-negativity correction method (simple boundary correction only) |
offset |
offset added to kernel centres (logtrans only) or |
xmax |
upper bound on support (copula and beta kernels only) or |
log |
logical, if TRUE then log density |
q |
quantiles |
lower.tail |
logical, if FALSE then upper tail probabilities |
p |
cumulative probabilities |
n |
sample size (positive integer) |
Boundary corrected kernel density estimation (BCKDE) with improved
bias properties near the boundary compared to standard KDE available in
kden
functions. The user chooses from a wide range
of boundary correction methods designed to cope with a lower bound at zero
and potentially also both upper and lower bounds.
Some boundary correction methods require a secondary correction for negative density estimates of which two methods are implemented. Further, some methods don't necessarily give a density which integrates to one, so an option is provided to renormalise to be proper.
It assumes there is a lower bound at zero, so prior transformation of data is required for a alternative lower bound (possibly including negation to allow for an upper bound).
The alternate bandwidth definitions are discussed in the
kernels
, with the lambda
as the default.
The bw
specification is the same as used in the
density
function.
Certain boundary correction methods use the standard kernels which are defined
in the kernels
help
documentation with the "gaussian"
as the default choice.
The quantile function is rather complicated as there is no closed form solution,
so is obtained by numerical approximation of the inverse cumulative distribution function
P(X ≤ q) = p to find q. The quantile function
qbckden
evaluates the KDE cumulative distribution
function over the range from c(0, max(kerncentre) + lambda)
,
or c(0, max(kerncentre) + 5*lambda)
for normal kernel. Outside of this
range the quantiles are set to 0
for lower tail and Inf
(or xmax
where appropriate) for upper tail. A sequence of values
of length fifty times the number of kernels (upto a maximum of 1000) is first
calculated. Spline based interpolation using splinefun
,
with default monoH.FC
method, is then used to approximate the quantile
function. This is a similar approach to that taken
by Matt Wand in the qkde
in the ks
package.
Unlike the standard KDE, there is no general rule-of-thumb bandwidth for all these
estimators, with only certain methods having a guideline in the literature, so none
have been implemented. Hence, a bandwidth must always be specified and you should
consider using fbckden
function for cross-validation
MLE for bandwidth.
Random number generation is slow as inversion sampling using the (numerically evaluated) quantile function is implemented. Users may want to consider alternative approaches instead, like rejection sampling.
dbckden
gives the density,
pbckden
gives the cumulative distribution function,
qbckden
gives the quantile function and
rbckden
gives a random sample.
Renormalisation to a proper density is assumed by default proper=TRUE
.
This correction is needed for bcmethod="renorm"
, "simple"
,
"beta1"
, "beta2"
, "gamma1"
and "gamma2"
which
all require numerical integration. Renormalisation will not be carried out
for other methods, even when proper=TRUE
.
Non-negativity correction is only relevant for the bcmethod="simple"
approach.
The Jones and Foster (1996) method is applied nn="jf96"
by default. This method
can occassionally give an extra boundary bias for certain populations (e.g. Gamma(2, 1)),
see paper for details. Non-negative values can simply be zeroed (nn="zero"
).
Renormalisation should always be applied after non-negativity correction. Non-negativity
correction will not be carried out for other methods, even when requested by user.
The non-negative correction is applied before renormalisation, when both requested.
The boundary correction methods implemented are listed below. The first set can use
any type of kernel (see kernels
help
documentation):
bcmethod="simple"
is the default and applies the simple boundary correction method
in equation (3.4) of Jones (1993) and is equivalent to the kernel weighted local linear
fitting at the boundary. Renormalisation and non-negativity correction may be required.
bcmethod="cutnorm"
applies cut and normalisation method of
Gasser and Muller (1979), where the kernels themselves are individually truncated at
the boundary and renormalised to unity.
bcmethod="renorm"
applies first order correction method discussed in
Diggle (1985), where the kernel density estimate is locally renormalised near boundary.
Renormalisation may be required.
bcmethod="reflect"
applies reflection method of Boneva, Kendall and Stefanov
(1971) which is equivalent to the dataset being supplemented by the same dataset negated.
This method implicitly assumes f'(0)=0, so can cause extra artefacts at the boundary.
bcmethod="logtrans"
applies KDE on the log-scale and then back-transforms (with
explicit normalisation) following Marron and Ruppert (1992). This is the approach
implemented in the ks
package. As the KDE is applied on
the log scale, the effective bandwidth on the original scale is non-constant. The
offset
option is only used for this method and is commonly used to offset
zero kernel centres in log transform to prevent log(0)
.
All the following boundary correction methods do not use kernels in their
usual sense, so ignore the kernel
input:
bcmethod="beta1"
and "beta2"
uses the beta and modified beta kernels
of Chen (1999) respectively. The xmax
rescales the beta kernels to be
defined on the support [0, xmax] rather than unscaled [0, 1]. Renormalisation
will be required.
bcmethod="gamma1"
and "gamma2"
uses the gamma and modified gamma kernels
of Chen (2000) respectively. Renormalisation will be required.
bcmethod="copula"
uses the bivariate normal copula based kernesl of
Jones and Henderson (2007). As with the bcmethod="beta1"
and "beta2"
methods the xmax
rescales the copula kernels to be defined on the support [0, xmax]
rather than [0, 1]. In this case the bandwidth is defined as lambda=1-ρ^2,
so the bandwidth is limited to (0, 1).
The "simple"
, "renorm"
, "beta1"
, "beta2"
, "gamma1"
and "gamma2"
boundary correction methods may require renormalisation using
numerical integration which can be very slow. In particular, the numerical integration
is extremely slow for the kernel="uniform"
, due to the adaptive quadrature in
the integrate
function
being particularly slow for functions with step-like behaviour.
Based on code by Anna MacDonald produced for MATLAB.
Unlike most of the other extreme value mixture model functions the
bckden
functions have not been vectorised as
this is not appropriate. The main inputs (x
, p
or q
)
must be either a scalar or a vector, which also define the output length.
The kernel centres kerncentres
can either be a single datapoint or a vector
of data. The kernel centres (kerncentres
) and locations to evaluate density (x
)
and cumulative distribution function (q
) would usually be different.
Default values are provided for all inputs, except for the fundamentals
lambda
, kerncentres
, x
, q
and p
.
The default sample size for rbckden
is 1.
The xmax
option is only relevant for the beta and copula methods, so a
warning is produced if this is not NULL
for in other methods.
The offset
option is only relevant for the "logtrans"
method, so a
warning is produced if this is not NULL
for in other methods.
Missing (NA
) and Not-a-Number (NaN
) values in x
,
p
and q
are passed through as is and infinite values are set to
NA
. None of these are not permitted for the parameters.
Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.
Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz.
http://en.wikipedia.org/wiki/Kernel_density_estimation
http://en.wikipedia.org/wiki/Cross-validation_(statistics)
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360.
Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179.
MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.
Chen, S.X. (1999). Beta kernel estimators for density functions. Computational Statistics and Data Analysis 31, 1310-45.
Gasser, T. and Muller, H. (1979). Kernel estimation of regression functions. In "Lecture Notes in Mathematics 757, edited by Gasser and Rosenblatt, Springer.
Chen, S.X. (2000). Probability density function estimation using gamma kernels. Annals of the Institute of Statisical Mathematics 52(3), 471-480.
Boneva, L.I., Kendall, D.G. and Stefanov, I. (1971). Spline transformations: Three new diagnostic aids for the statistical data analyst (with discussion). Journal of the Royal Statistical Society B, 33, 1-70.
Diggle, P.J. (1985). A kernel method for smoothing point process data. Applied Statistics 34, 138-147.
Marron, J.S. and Ruppert, D. (1994) Transformations to reduce boundary bias in kernel density estimation, Journal of the Royal Statistical Society. Series B 56(4), 653-671.
Jones, M.C. and Henderson, D.A. (2007). Kernel-type density estimation on the unit interval. Biometrika 94(4), 977-984.
kernels
, kfun
,
density
, bw.nrd0
and dkde
in ks
package.
Other kden: fbckden
, fgkgcon
,
fgkg
, fkdengpdcon
,
fkdengpd
, fkden
,
kdengpdcon
, kdengpd
,
kden
Other bckden: bckdengpdcon
,
bckdengpd
, fbckdengpdcon
,
fbckdengpd
, fbckden
,
fkden
, kden
Other bckdengpd: bckdengpdcon
,
bckdengpd
, fbckdengpdcon
,
fbckdengpd
, fbckden
,
fkdengpd
, gkg
,
kdengpd
, kden
Other bckdengpdcon: bckdengpdcon
,
bckdengpd
, fbckdengpdcon
,
fbckdengpd
, fbckden
,
fkdengpdcon
, gkgcon
,
kdengpdcon
Other fbckden: fbckden
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ## Not run:
set.seed(1)
par(mfrow = c(1, 1))
n=100
x = rgamma(n, shape = 1, scale = 2)
xx = seq(-0.5, 12, 0.01)
plot(xx, dgamma(xx, shape = 1, scale = 2), type = "l")
rug(x)
lines(xx, dbckden(xx, x, lambda = 1), lwd = 2, col = "red")
lines(density(x), lty = 2, lwd = 2, col = "green")
legend("topright", c("True Density", "Simple boundary correction",
"KDE using density function", "Boundary Corrected Kernels"),
lty = c(1, 1, 2, 1), lwd = c(1, 2, 2, 1), col = c("black", "red", "green", "blue"))
n=100
x = rbeta(n, shape1 = 3, shape2 = 2)*5
xx = seq(-0.5, 5.5, 0.01)
plot(xx, dbeta(xx/5, shape1 = 3, shape2 = 2)/5, type = "l", ylim = c(0, 0.8))
rug(x)
lines(xx, dbckden(xx, x, lambda = 0.1, bcmethod = "beta2", proper = TRUE, xmax = 5),
lwd = 2, col = "red")
lines(density(x), lty = 2, lwd = 2, col = "green")
legend("topright", c("True Density", "Modified Beta KDE Using evmix",
"KDE using density function"),
lty = c(1, 1, 2), lwd = c(1, 2, 2), col = c("black", "red", "green"))
# Demonstrate renormalisation (usually small difference)
n=1000
x = rgamma(n, shape = 1, scale = 2)
xx = seq(-0.5, 15, 0.01)
plot(xx, dgamma(xx, shape = 1, scale = 2), type = "l")
rug(x)
lines(xx, dbckden(xx, x, lambda = 0.5, bcmethod = "simple", proper = TRUE),
lwd = 2, col = "purple")
lines(xx, dbckden(xx, x, lambda = 0.5, bcmethod = "simple", proper = FALSE),
lwd = 2, col = "red", lty = 2)
legend("topright", c("True Density", "Simple BC with renomalisation",
"Simple BC without renomalisation"),
lty = 1, lwd = c(1, 2, 2), col = c("black", "purple", "red"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.