prepareKernel: Check the data for kernel estimation
In smoothemplik: Smoothed Empirical Likelihood

prepareKernel

R Documentation

Check the data for kernel estimation

Description

Checks if the order is 2, 4, or 6, transforms the objects into matrices, checks the dimensions, provides the bandwidth, creates default arguments to pass to the C++ functions, carries out de-duplication for speed-up etc.

Usage

prepareKernel(
  x,
  y = NULL,
  xout = NULL,
  weights = NULL,
  bw = NULL,
  kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
  order = 2,
  convolution = FALSE,
  sparse = FALSE,
  deduplicate.x = TRUE,
  deduplicate.xout = TRUE,
  no.dedup = FALSE,
  PIT = FALSE
)

Arguments

`x`	A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables.
`y`	Optional: a vector of dependent variable values.
`xout`	A vector or a matrix of data points with `ncol(xout) = ncol(x)` at which the user desires to compute the weights, density, or predictions. In other words, this is the requested evaluation grid. If `NULL`, then `x` itself is used as the grid.
`weights`	A numeric vector of observation weights (typically counts) to perform weighted operations. If null, `rep(1, NROW(x))` is used. In all calculations, the total number of observations is assumed to be the sum of `weights`.
`bw`	Bandwidth for the kernel: a scalar or a vector of the same length as `ncol(x)`. Since it is the crucial parameter in many applications, a warning is thrown if the bandwidth is not supplied, and then, Silverman's rule of thumb (via `bw.row()`) is applied to every dimension of `x`.
`kernel`	Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support.
`order`	An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation.
`convolution`	Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation.
`sparse`	Logical: TODO (ignored)
`deduplicate.x`	Logical: if TRUE, full duplicates in the input `x` and `y` are counted and transformed into weights; subsetting indices to reconstruct the duplicated data set from the unique one are also returned.
`deduplicate.xout`	Logical: if TRUE, full duplicates in the input `xout` are counted; subsetting indices to reconstruct the duplicated data set from the unique one are returned.
`no.dedup`	Logical: if TRUE, sets `deduplicate.x` and `deduplicate.xout` to FALSE (shorthand).
`PIT`	If TRUE, the Probability Integral Transform (PIT) is applied to all columns of `x` via `ecdf` in order to map all values into the [0, 1] range. May be an integer vector of indices of columns to which the PIT should be applied.

Value

A list of arguments that are taken by [kernelDensity()] and [kernelSmooth()].

Examples

# De-duplication facilities
set.seed(1)  # Creating a data set with many duplicates
n.uniq <- 10000
n <- 60000
inds <- ceiling(runif(n, 0, n.uniq))
x.uniq <- matrix(rnorm(n.uniq*10), ncol = 10)
x <- x.uniq[inds, ]
y <- runif(n.uniq)[inds]
xout <- x.uniq[ceiling(runif(n.uniq*3, 0, n.uniq)), ]
w <- runif(n)
print(system.time(a1 <- prepareKernel(x, y, xout, w, bw = 0.5)))
print(system.time(a2 <- prepareKernel(x, y, xout, w, bw = 0.5,
                  deduplicate.x = FALSE, deduplicate.xout = FALSE)))
print(c(object.size(a1), object.size(a2)) / 1024) # Kilobytes used
# Speed-memory trade-off: 4 times smaller, takes 0.2 s, but reduces the
# number of matrix operations by a factor of
1 - prod(1 - a1$duplicate.stats[1:2])    # 95% fewer operations
sum(a1$weights) - sum(a2$weights)  # Should be 0 or near machine epsilon

smoothemplik documentation built on Aug. 22, 2025, 1:11 a.m.