kernelSmooth: Local kernel smoother
In smoothemplik: Smoothed Empirical Likelihood

kernelSmooth

R Documentation

Local kernel smoother

Description

Local kernel smoother

Usage

kernelSmooth(
  x,
  y,
  xout = NULL,
  weights = NULL,
  bw = NULL,
  kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
  order = 2,
  convolution = FALSE,
  chunks = 0,
  PIT = FALSE,
  LOO = FALSE,
  degree = 0,
  trim = function(x) 0.01/length(x),
  robust.iterations = 0,
  robust = c("bisquare", "huber"),
  deduplicate.x = TRUE,
  deduplicate.xout = TRUE,
  no.dedup = FALSE,
  return.grid = FALSE
)

Arguments

`x`	A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables.
`y`	A numeric vector of dependent variable values.
`xout`	A vector or a matrix of data points with `ncol(xout) = ncol(x)` at which the user desires to compute the weights, density, or predictions. In other words, this is the requested evaluation grid. If `NULL`, then `x` itself is used as the grid.
`weights`	A numeric vector of observation weights (typically counts) to perform weighted operations. If null, `rep(1, NROW(x))` is used. In all calculations, the total number of observations is assumed to be the sum of `weights`.
`bw`	Bandwidth for the kernel: a scalar or a vector of the same length as `ncol(x)`. Since it is the crucial parameter in many applications, a warning is thrown if the bandwidth is not supplied, and then, Silverman's rule of thumb (via `bw.row()`) is applied to every dimension of `x`.
`kernel`	Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support.
`order`	An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation.
`convolution`	Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation.
`chunks`	Integer: the number of chunks to split the task into (limits RAM usage but increases overhead). `0` = auto-select (making sure that no matrix has more than 2^27 elements).
`PIT`	If TRUE, the Probability Integral Transform (PIT) is applied to all columns of `x` via `ecdf` in order to map all values into the [0, 1] range. May be an integer vector of indices of columns to which the PIT should be applied.
`LOO`	Logical: If `TRUE`, the leave-one-out estimator is returned.
`degree`	Integer: 0 for locally constant estimator (Nadaraya–Watson), 1 for locally linear (Cleveland's LOESS), 2 for locally quadratic (use with care, less stable, requires larger bandwidths)
`trim`	Trimming function for small weights to speed up locally weighted regression (if `degree` is 1 or 2).
`robust.iterations`	The number of robustifying iterations (due to Cleveland, 1979). If greater than 0, `xout` is ignored.
`robust`	Character: "huber" for Huber's local regression weights, "bisquare" for more robust bi-square ones
`deduplicate.x`	Logical: if TRUE, full duplicates in the input `x` and `y` are counted and transformed into weights; subsetting indices to reconstruct the duplicated data set from the unique one are also returned.
`deduplicate.xout`	Logical: if TRUE, full duplicates in the input `xout` are counted; subsetting indices to reconstruct the duplicated data set from the unique one are returned.
`no.dedup`	Logical: if TRUE, sets `deduplicate.x` and `deduplicate.xout` to FALSE (shorthand).
`return.grid`	If `TRUE`, prepends `xout` to the return results. Standardisation is recommended for the purposes of numerical stability (sometimes `lm()` might choke when the dependent variable takes very large absolute values and its square is used). The robust iterations are carried out, if requested, according to @cleveland1979robust. Huber weights are never zero; bisquare weights create a more robust re-descending estimator. Note: if `x` and `xout` are different but robust iterations were requested, the robustification can take longer. TODO: do not estimate on (x, grid), do the calculation with K.full straight away. Note: if `LOO` is used, it makes sense to de-duplicate observations first. By default, this behaviour is not enforced in this function, but when it is called in cross-validation routines, the de-duplication is forced. It makes no sense to zero out once observation out of many repeated.

Value

A vector of predicted values or, if return.grid is TRUE, a matrix with the predicted values in the last column.

Examples

set.seed(1)
n <- 300
x <- sort(rt(n, df = 6)) # Observed values
g <- seq(-4, 5, 0.1) # Grid for evaluation
f <- function(x) 1 + x + sin(x) # True E(Y | X) = f(X)
y <- f(x) + rt(n, df = 4)
# 3 estimators: locally constant + 2nd-order kernel,
# locally constant + 4th-order kernel, locally linear robust
b2lc <- suppressWarnings(bw.CV(x, y = y, kernel = "quartic")
                         + 0.8)
b4lc <- suppressWarnings(bw.CV(x, y = y, kernel = "quartic", order = 4,
              try.grid = FALSE, start.bw = 3) + 1)
b2ll <- bw.CV(x, y = y, kernel = "quartic", degree = 1, robust.iterations = 1,
              try.grid = FALSE, start.bw = 3, verbose = TRUE)
m2lc <- kernelSmooth(x, y, g, bw = b2lc, kernel = "quartic", no.dedup = TRUE)
m4lc <- kernelSmooth(x, y, g, bw = b4lc, kernel = "quartic", order = 4, no.dedup = TRUE)
m2ll <- kernelSmooth(x, y, g, bw = b2ll, kernel = "quartic",
                     degree = 1, robust.iterations = 1, no.dedup = TRUE)
plot(x, y, xlim = c(-6, 7), col = "#00000088", bty = "n")
lines(g, f(g), col = "white", lwd = 5); lines(g, f(g))
lines(g, m2lc, col = 2); lines(g, m4lc, col = 3); lines(g, m2ll, col = 4)
# De-duplication facilities for faster operations
set.seed(1)  # Creating a data set with many duplicates
n.uniq <- 1000
n <- 4000
inds <- sort(ceiling(runif(n, 0, n.uniq)))
x.uniq <- sort(rnorm(n.uniq))
y.uniq <- 1 + x.uniq + sin(x.uniq*2) + rnorm(n.uniq)
x <- x.uniq[inds]
y <- y.uniq[inds]
xout <- x.uniq[sort(ceiling(runif(n.uniq*3, 0, n.uniq)))]
w <- runif(n)
data.table::setDTthreads(1) # For measuring the pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
kr1 <- kernelSmooth(x, y, xout, w, bw = 0.2)
kr2 <- kernelSmooth(x, y, xout, w, bw = 0.5, no.dedup = TRUE)
stat1 <- attr(kr1, "duplicate.stats")
stat2 <- attr(kr2, "duplicate.stats")
print(stat1[3:5]) # De-duplication time -- worth it
print(stat2[3:5]) # Without de-duplication, slower
unname(prod((1 - stat1[1:2])) / (stat1[5] / stat2[5])) # > 1 = better time
# savings than expected, < 1 = worse time savings than expected
all.equal(as.numeric(kr1), as.numeric(kr2))
max(abs(kr1 - kr2)) # Should be around machine epsilon or less

# Example in 2 dimensions
# TODO

smoothemplik documentation built on Aug. 22, 2025, 1:11 a.m.