kernelSmooth | R Documentation |
Local kernel smoother
kernelSmooth(
x,
y,
xout = NULL,
weights = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
chunks = 0,
PIT = FALSE,
LOO = FALSE,
degree = 0,
trim = function(x) 0.01/length(x),
robust.iterations = 0,
robust = c("bisquare", "huber"),
deduplicate.x = TRUE,
deduplicate.xout = TRUE,
no.dedup = FALSE,
return.grid = FALSE
)
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
A numeric vector of dependent variable values. |
xout |
A vector or a matrix of data points with |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
LOO |
Logical: If |
degree |
Integer: 0 for locally constant estimator (Nadaraya–Watson), 1 for locally linear (Cleveland's LOESS), 2 for locally quadratic (use with care, less stable, requires larger bandwidths) |
trim |
Trimming function for small weights to speed up locally weighted regression (if |
robust.iterations |
The number of robustifying iterations (due to Cleveland, 1979). If greater than 0, |
robust |
Character: "huber" for Huber's local regression weights, "bisquare" for more robust bi-square ones |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
return.grid |
If Standardisation is recommended for the purposes of numerical stability (sometimes
The robust iterations are carried out, if requested, according to @cleveland1979robust. Huber weights are never zero; bisquare weights create a more robust re-descending estimator. Note: if Note: if |
A vector of predicted values or, if return.grid
is TRUE
,
a matrix with the predicted values in the last column.
set.seed(1)
n <- 300
x <- sort(rt(n, df = 6)) # Observed values
g <- seq(-4, 5, 0.1) # Grid for evaluation
f <- function(x) 1 + x + sin(x) # True E(Y | X) = f(X)
y <- f(x) + rt(n, df = 4)
# 3 estimators: locally constant + 2nd-order kernel,
# locally constant + 4th-order kernel, locally linear robust
b2lc <- suppressWarnings(bw.CV(x, y = y, kernel = "quartic")
+ 0.8)
b4lc <- suppressWarnings(bw.CV(x, y = y, kernel = "quartic", order = 4,
try.grid = FALSE, start.bw = 3) + 1)
b2ll <- bw.CV(x, y = y, kernel = "quartic", degree = 1, robust.iterations = 1,
try.grid = FALSE, start.bw = 3, verbose = TRUE)
m2lc <- kernelSmooth(x, y, g, bw = b2lc, kernel = "quartic", no.dedup = TRUE)
m4lc <- kernelSmooth(x, y, g, bw = b4lc, kernel = "quartic", order = 4, no.dedup = TRUE)
m2ll <- kernelSmooth(x, y, g, bw = b2ll, kernel = "quartic",
degree = 1, robust.iterations = 1, no.dedup = TRUE)
plot(x, y, xlim = c(-6, 7), col = "#00000088", bty = "n")
lines(g, f(g), col = "white", lwd = 5); lines(g, f(g))
lines(g, m2lc, col = 2); lines(g, m4lc, col = 3); lines(g, m2ll, col = 4)
# De-duplication facilities for faster operations
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 1000
n <- 4000
inds <- sort(ceiling(runif(n, 0, n.uniq)))
x.uniq <- sort(rnorm(n.uniq))
y.uniq <- 1 + x.uniq + sin(x.uniq*2) + rnorm(n.uniq)
x <- x.uniq[inds]
y <- y.uniq[inds]
xout <- x.uniq[sort(ceiling(runif(n.uniq*3, 0, n.uniq)))]
w <- runif(n)
data.table::setDTthreads(1) # For measuring the pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
kr1 <- kernelSmooth(x, y, xout, w, bw = 0.2)
kr2 <- kernelSmooth(x, y, xout, w, bw = 0.5, no.dedup = TRUE)
stat1 <- attr(kr1, "duplicate.stats")
stat2 <- attr(kr2, "duplicate.stats")
print(stat1[3:5]) # De-duplication time -- worth it
print(stat2[3:5]) # Without de-duplication, slower
unname(prod((1 - stat1[1:2])) / (stat1[5] / stat2[5])) # > 1 = better time
# savings than expected, < 1 = worse time savings than expected
all.equal(as.numeric(kr1), as.numeric(kr2))
max(abs(kr1 - kr2)) # Should be around machine epsilon or less
# Example in 2 dimensions
# TODO
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.