densityBC | R Documentation |
Fixed-bandwidth kernel density estimation on the real line, or the positive real half-line, including optional corrections for a boundary at zero.
densityBC(x, kernel = "epanechnikov", bw=NULL,
...,
h=NULL,
adjust = 1,
weights = rep(1, length(x))/length(x), from, to = max(x), n = 256,
zerocor = c("none", "weighted", "convolution", "reflection",
"bdrykern", "JonesFoster"),
fast=FALSE,
internal=list())
x |
Numeric vector of observed values. |
kernel |
String specifying kernel.
Options are
|
bw , h |
Alternative specifications of the scale factor for the kernel.
The bandwidth |
adjust |
Numeric value used to rescale the bandwidth |
weights |
Numeric vector of weights associated with |
from , to |
Lower and upper limits of interval on which density should be
computed.
The default value of |
n |
Number of |
zerocor |
String (partially matched) specifying a correction for the boundary effect
bias at |
fast |
Logical value specifying whether to perform the calculation rapidly
using the Fast Fourier Transform ( |
internal |
Internal use only. |
... |
Additional arguments are ignored. |
This function computes a fixed-bandwidth kernel estimate of a probability density on the real line, or the positive half-line, including optional boundary corrections for truncation of the density onto the positive half line.
Weighted estimates are supported, including negative weights. Weights are not renormalised to sum to 1. The resulting probability density estimate is not renormalised to integrate to 1.
Options for the smoothing kernel
are described in the help for density.default
.
The default is the Epanechnikov (truncated quadratic) kernel.
If zerocor
is missing, or given as "none"
,
this function computes the fixed-bandwidth kernel estimator of the
probability density on the real line,
using density.default
.
The estimated probability density (unnormalised) is
\widehat f(x) = \sum_{i=1}^n w_i \; \kappa(x - x_i)
where x_1,\ldots,x_n
are the data values,
w_1,\ldots,w_n
are the weights (defaulting
to w_i = 1/n
),
and \kappa
is the kernel, a probability density on the
real line.
If zerocor
is given, the probability density is assumed to be
confined to the positive half-line; the numerical values in x
must all be non-negative; and a boundary correction is
applied to compensate for bias arising due to truncation at the origin:
zerocor="weighted"
:The contribution from each data point x_i
is weighted by the factor 1/m(x_i)
where m(x) = 1 - F(-x)
is the total mass of the kernel centred on
x
that lies in the positive half-line, and F(x)
is the
cumulative distribution function of the kernel.
The corrected estimate is
\widehat f_W(x) = \sum_{i=1}^n w_i \; \frac{\kappa(x - x_i)}{1-F(-x_i)}
This is the “cut-and-normalization” method of Gasser and Mueller (1979). Effectively the kernel is renormalized so that it integrates to 1, and the adjusted kernel conserves mass.
zerocor="convolution"
:The estimate of the density f(x)
is
weighted by the factor 1/m(x)
where m(r) = 1 - F(-x)
is given above.
The corrected estimate is
\widehat f_C(x) = \sum_{i=1}^n w_i \; \frac{\kappa(x - x_i)}{1-F(-x)}
This is the “convolution”, “uniform” or “zero-order” boundary correction method often attributed to Diggle (1985). This correction does not conserve mass. It is faster to compute than the weighted correction.
zerocor="reflection"
:if the kernel centred at data point x_i
has a tail that lies on the negative half-line, this tail is
reflected onto the positive half-line.
The corrected estimate is
\widehat f_R(x) = \sum_{i=1}^n w_i \;
[ \kappa(x - x_i) + \kappa(-x - x_i) ]
This is the “reflection” method first proposed by
Boneva et al (1971). This correction conserves mass.
The estimated density always has zero derivative at the origin,
\widehat f_R^\prime(0) = 0
, which may or
may not be desirable.
zerocor="bdrykern"
:The density estimate is computed using the Linear Boundary Kernel associated with the chosen kernel (Wand and Jones, 1995, page 47). The estimated (unnormalised) probability density is
\widehat f_B(x) = \sum_{i=1}^n w_i \;
[ A(x) + (x-x_i) B(x)] \kappa(x - x_i)
where A(x) = a_2(x)/D(x)
and
B(x) = -a_1(x)/D(x)
with
D(x) = a_0(x) a_2(x) - a_1(x)^2
where
a_k(x) = \int_{-\infty}^x t^k \kappa(t) dt.
That is, when estimating the density f(x)
for values of
x
close to zero (defined as x < h
for all kernels
except the Gaussian), the kernel contribution
k_h(x - x_i)
is multiplied by a
term that is a linear function of x - x_i
,
with coefficients depending on x
.
This correction does not conserve mass and may result in
negative values, but is asymptotically optimal.
Computation time for this estimate is greater than for
the options above.
zerocor="JonesFoster"
:The modification of the Boundary Kernel estimate proposed by Jones and Foster (1996) is computed:
\widehat f_{JF}(x) =
\widehat f_C(x)
\exp\left( \frac{\widehat f_B(x)}{\widehat f_C(r)} - 1 \right)
where \widehat f_C(r)
is the convolution estimator
and \widehat f_B(r)
is the linear boundary kernel estimator.
This ensures that the estimate is always nonnegative
and retains the asymptotic optimality of the linear boundary
kernel.
Computation time for this estimate
is greater than for all the options above.
If fast=TRUE
, the calculations are performed rapidly using
density.default
which employs the Fast Fourier
Transform. If fast=FALSE
(the default), the calculations are
performed exactly using slower C code.
An object of class "density"
as described in the help file
for density.default
. It contains at least the entries
x |
Vector of |
y |
Vector of density values |
and \martinH.
Boneva, L.I., Kendall, D.G. and Stefanov, I. (1971) Spline transformations: three new diagnostic aids for the statistical data-analyst (with discussion). Journal of the Royal Statistical Society, Series B, 33, 1–70.
Diggle, P.J. (1985) A kernel method for smoothing point process data. Journal of the Royal Statistical Society, Series C (Applied Statistics), 34 138–147.
Gasser, Th. and Mueller, H.-G. (1979). Kernel estimation of regression functions. In Th. Gasser and M. Rosenblatt (editors) Smoothing Techniques for Curve Estimation, pages 23–68. Springer, Berlin.
Jones, M.C. and Foster, P.J. (1996) A simple nonnegative boundary correction method for kernel density estimation. Statistica Sinica, 6 (4) 1005–1013.
Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Chapman and Hall.
density.default
.
dkernel
for the kernel itself.
densityAdaptiveKernel.default
for adaptive
(variable-bandwidth) estimation.
sim.dat <- rexp(500)
fhatN <- densityBC(sim.dat, "biweight", h=0.4)
fhatB <- densityBC(sim.dat, "biweight", h=0.4, zerocor="bdrykern")
plot(fhatN, ylim=c(0,1.1), main="density estimates")
lines(fhatB, col=2)
curve(dexp(x), add=TRUE, from=0, col=3)
legend(2, 0.8,
legend=c("fixed bandwidth", "boundary kernel", "true density"),
col=1:3, lty=rep(1,3))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.