densityBC: Kernel Density Estimation with Optional Boundary Correction

View source: R/densityBC.R

densityBCR Documentation

Kernel Density Estimation with Optional Boundary Correction

Description

Fixed-bandwidth kernel density estimation on the real line, or the positive real half-line, including optional corrections for a boundary at zero.

Usage

  densityBC(x, kernel = "epanechnikov", bw=NULL,
      ...,
      h=NULL,
      adjust = 1,
      weights = rep(1, length(x))/length(x), from, to = max(x), n = 256,
      zerocor = c("none", "weighted", "convolution", "reflection",
                  "bdrykern", "JonesFoster"),
      fast=FALSE,
      internal=list())

Arguments

x

Numeric vector of observed values.

kernel

String specifying kernel. Options are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" and "optcosine". (Partial matching is used). Options are described in the help for density.default.

bw, h

Alternative specifications of the scale factor for the kernel. The bandwidth bw is the standard deviation of the kernel (this agrees with the argument bw in density.default). The rescale factor h is the factor by which the ‘standard form’ of the kernel is rescaled. For the Epanechnikov kernel, h = bw * sqrt(5) is the half-width of the support, while for the Gaussian kernel, h = bw is the standard deviation. Either bw or h should be given, and should be a single numeric value, or a character string indicating a bandwidth selection rule as described in density.default.

adjust

Numeric value used to rescale the bandwidth bw and halfwidth h. The bandwidth used is adjust * bw. This makes it easy to specify values like ‘half the default’ bandwidth.

weights

Numeric vector of weights associated with x. The weights are not required to sum to 1, and will not be normalised to sum to 1. The weights may include negative values.

from, to

Lower and upper limits of interval on which density should be computed. The default value of from is from=min(x) if zerocor="none", and from=0 otherwise.

n

Number of r values for which density should be computed.

zerocor

String (partially matched) specifying a correction for the boundary effect bias at r=0 when estimating a density on the positive half line. Possible values are "none", "weighted", "convolution", "reflection", "bdrykern" and "JonesFoster".

fast

Logical value specifying whether to perform the calculation rapidly using the Fast Fourier Transform (fast=TRUE) or to use slower, exact code (fast=FALSE, the default).

internal

Internal use only.

...

Additional arguments are ignored.

Details

This function computes a fixed-bandwidth kernel estimate of a probability density on the real line, or the positive half-line, including optional boundary corrections for truncation of the density onto the positive half line.

Weighted estimates are supported, including negative weights. Weights are not renormalised to sum to 1. The resulting probability density estimate is not renormalised to integrate to 1.

Options for the smoothing kernel are described in the help for density.default. The default is the Epanechnikov (truncated quadratic) kernel.

If zerocor is missing, or given as "none", this function computes the fixed-bandwidth kernel estimator of the probability density on the real line, using density.default. The estimated probability density (unnormalised) is

\widehat f(x) = \sum_{i=1}^n w_i \; \kappa(x - x_i)

where x_1,\ldots,x_n are the data values, w_1,\ldots,w_n are the weights (defaulting to w_i = 1/n), and \kappa is the kernel, a probability density on the real line.

If zerocor is given, the probability density is assumed to be confined to the positive half-line; the numerical values in x must all be non-negative; and a boundary correction is applied to compensate for bias arising due to truncation at the origin:

zerocor="weighted":

The contribution from each data point x_i is weighted by the factor 1/m(x_i) where m(x) = 1 - F(-x) is the total mass of the kernel centred on x that lies in the positive half-line, and F(x) is the cumulative distribution function of the kernel. The corrected estimate is

\widehat f_W(x) = \sum_{i=1}^n w_i \; \frac{\kappa(x - x_i)}{1-F(-x_i)}

This is the “cut-and-normalization” method of Gasser and Mueller (1979). Effectively the kernel is renormalized so that it integrates to 1, and the adjusted kernel conserves mass.

zerocor="convolution":

The estimate of the density f(x) is weighted by the factor 1/m(x) where m(r) = 1 - F(-x) is given above. The corrected estimate is

\widehat f_C(x) = \sum_{i=1}^n w_i \; \frac{\kappa(x - x_i)}{1-F(-x)}

This is the “convolution”, “uniform” or “zero-order” boundary correction method often attributed to Diggle (1985). This correction does not conserve mass. It is faster to compute than the weighted correction.

zerocor="reflection":

if the kernel centred at data point x_i has a tail that lies on the negative half-line, this tail is reflected onto the positive half-line. The corrected estimate is

\widehat f_R(x) = \sum_{i=1}^n w_i \; [ \kappa(x - x_i) + \kappa(-x - x_i) ]

This is the “reflection” method first proposed by Boneva et al (1971). This correction conserves mass. The estimated density always has zero derivative at the origin, \widehat f_R^\prime(0) = 0, which may or may not be desirable.

zerocor="bdrykern":

The density estimate is computed using the Linear Boundary Kernel associated with the chosen kernel (Wand and Jones, 1995, page 47). The estimated (unnormalised) probability density is

\widehat f_B(x) = \sum_{i=1}^n w_i \; [ A(x) + (x-x_i) B(x)] \kappa(x - x_i)

where A(x) = a_2(x)/D(x) and B(x) = -a_1(x)/D(x) with D(x) = a_0(x) a_2(x) - a_1(x)^2 where a_k(x) = \int_{-\infty}^x t^k \kappa(t) dt. That is, when estimating the density f(x) for values of x close to zero (defined as x < h for all kernels except the Gaussian), the kernel contribution k_h(x - x_i) is multiplied by a term that is a linear function of x - x_i, with coefficients depending on x. This correction does not conserve mass and may result in negative values, but is asymptotically optimal. Computation time for this estimate is greater than for the options above.

zerocor="JonesFoster":

The modification of the Boundary Kernel estimate proposed by Jones and Foster (1996) is computed:

\widehat f_{JF}(x) = \widehat f_C(x) \exp\left( \frac{\widehat f_B(x)}{\widehat f_C(r)} - 1 \right)

where \widehat f_C(r) is the convolution estimator and \widehat f_B(r) is the linear boundary kernel estimator. This ensures that the estimate is always nonnegative and retains the asymptotic optimality of the linear boundary kernel. Computation time for this estimate is greater than for all the options above.

If fast=TRUE, the calculations are performed rapidly using density.default which employs the Fast Fourier Transform. If fast=FALSE (the default), the calculations are performed exactly using slower C code.

Value

An object of class "density" as described in the help file for density.default. It contains at least the entries

x

Vector of x values

y

Vector of density values y= f(x)

Author(s)

\adrian

and \martinH.

References

\ournewpaper

Boneva, L.I., Kendall, D.G. and Stefanov, I. (1971) Spline transformations: three new diagnostic aids for the statistical data-analyst (with discussion). Journal of the Royal Statistical Society, Series B, 33, 1–70.

Diggle, P.J. (1985) A kernel method for smoothing point process data. Journal of the Royal Statistical Society, Series C (Applied Statistics), 34 138–147.

Gasser, Th. and Mueller, H.-G. (1979). Kernel estimation of regression functions. In Th. Gasser and M. Rosenblatt (editors) Smoothing Techniques for Curve Estimation, pages 23–68. Springer, Berlin.

Jones, M.C. and Foster, P.J. (1996) A simple nonnegative boundary correction method for kernel density estimation. Statistica Sinica, 6 (4) 1005–1013.

Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Chapman and Hall.

See Also

density.default.

dkernel for the kernel itself.

densityAdaptiveKernel.default for adaptive (variable-bandwidth) estimation.

Examples

  sim.dat <- rexp(500)
  fhatN <- densityBC(sim.dat, "biweight", h=0.4)
  fhatB <- densityBC(sim.dat, "biweight", h=0.4, zerocor="bdrykern")
  plot(fhatN, ylim=c(0,1.1), main="density estimates")
  lines(fhatB, col=2)
  curve(dexp(x), add=TRUE, from=0, col=3)
  legend(2, 0.8,
     legend=c("fixed bandwidth", "boundary kernel", "true density"),
     col=1:3, lty=rep(1,3))

spatstat.univar documentation built on June 8, 2025, 12:52 p.m.