kernelWeights: Kernel-based weights

kernelWeightsR Documentation

Kernel-based weights

Description

Kernel-based weights

Usage

kernelWeights(
  x,
  xout = NULL,
  bw = NULL,
  kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
  order = 2,
  convolution = FALSE,
  sparse = FALSE,
  PIT = FALSE,
  deduplicate.x = FALSE,
  deduplicate.xout = FALSE,
  no.dedup = FALSE
)

Arguments

x

A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables.

xout

A vector or a matrix of data points with ncol(xout) = ncol(x) at which the user desires to compute the weights, density, or predictions. In other words, this is the requested evaluation grid. If NULL, then x itself is used as the grid.

bw

Bandwidth for the kernel: a scalar or a vector of the same length as ncol(x). Since it is the crucial parameter in many applications, a warning is thrown if the bandwidth is not supplied, and then, Silverman's rule of thumb (via bw.row()) is applied to *every dimension* of x.

kernel

Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support.

order

An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation.

convolution

Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation.

sparse

Logical: TODO (should be ignored?)

Note that if pit = TRUE, then the kernel-based weights become nearest-neighbour weights (i.e. not much different from the ones used internally in the built-in loess function) since the distances now depend on the ordering of data, not the values per se.

Technical remark: if the kernel is Gaussian, then, the ratio of the tail density to the maximum value (at 0) is less than mach.eps/2 when abs(x) > 2*sqrt(106*log(2)) ~ 8.572. This has implications the relative error of the calculation: even the kernel with full support (theoretically) may fail to produce numerically distinct values if the argument values are more than ~8.5 standard deviations away from the mean.

PIT

If TRUE, the Probability Integral Transform (PIT) is applied to all columns of x via ecdf in order to map all values into the [0, 1] range. May be an integer vector of indices of columns to which the PIT should be applied.

deduplicate.x

Logical: if TRUE, full duplicates in the input x and y are counted and transformed into weights; subsetting indices to reconstruct the duplicated data set from the unique one are also returned.

deduplicate.xout

Logical: if TRUE, full duplicates in the input xout are counted; subsetting indices to reconstruct the duplicated data set from the unique one are returned.

no.dedup

Logical: if TRUE, sets deduplicate.x and deduplicate.xout to FALSE (shorthand).

Value

A matrix of weights of dimensions nrow(xout) x nrow(x).

Examples

set.seed(1)
x   <- sort(rnorm(1000)) # Observed values
g   <- seq(-10, 10, 0.1) # Grid for evaluation
w   <- kernelWeights(x, g, bw = 2, kernel = "triangular")
wsp <- kernelWeights(x, g, bw = 2, kernel = "triangular", sparse = TRUE)
print(c(object.size(w), object.size(wsp)) / 1024) # Kilobytes used
image(g, x, w)
all.equal(w[, 1],  # Internal calculation for one column
            kernelFun((g - x[1])/2, "triangular", 2, FALSE))

# Bare-bones interface to the C++ functions
# Example: 4th-order convolution kernels
x <- seq(-3, 5, length.out = 301)
ks <- c("uniform", "triangular", "epanechnikov", "quartic", "gaussian")
kmat <- sapply(ks, function(k) kernelFun(x, k, 4, TRUE))
matplot(x, kmat, type = "l", lty = 1, bty = "n", lwd = 2)
legend("topright", ks, col = 1:5, lwd = 2)

smoothemplik documentation built on Aug. 22, 2025, 1:11 a.m.