kernelDensity | R Documentation |
Kernel density estimation
kernelDensity(
x,
xout = NULL,
weights = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
chunks = 0,
PIT = FALSE,
deduplicate.x = TRUE,
deduplicate.xout = TRUE,
no.dedup = FALSE,
return.grid = FALSE
)
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
xout |
A vector or a matrix of data points with |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
return.grid |
Logical: if The number of chunks for kernel density and regression estimation is chosen
in such a manner that the number of elements in the internal weight matrix
should not exceed |
A vector of density estimates evaluated at the grid points or, if return.grid
, a matrix with the density in the last column.
set.seed(1)
x <- sort(rt(10000, df = 5)) # Observed values
g <- seq(-6, 6, 0.05) # Grid for evaluation
d2 <- kernelDensity(x, g, bw = 0.3, kernel = "epanechnikov", no.dedup = TRUE)
d4 <- kernelDensity(x, g, bw = 0.4, kernel = "quartic", order = 4, no.dedup = TRUE)
plot(g, d2, ylim = range(0, d2, d4), type = "l"); lines(g, d4, col = 2)
# De-duplication facilities for faster operations
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 1000
n <- 4000
inds <- ceiling(runif(n, 0, n.uniq))
x.uniq <- matrix(rnorm(n.uniq*10), ncol = 10)
x <- x.uniq[inds, ]
xout <- x.uniq[ceiling(runif(n.uniq*3, 0, n.uniq)), ]
w <- runif(n)
data.table::setDTthreads(1) # For measuring the pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
kd1 <- kernelDensity(x, xout, w, bw = 0.5)
kd2 <- kernelDensity(x, xout, w, bw = 0.5, no.dedup = TRUE)
stat1 <- attr(kd1, "duplicate.stats")
stat2 <- attr(kd2, "duplicate.stats")
print(stat1[3:5]) # De-duplication time -- worth it
print(stat2[3:5]) # Without de-duplication, slower
unname(prod((1 - stat1[1:2])) / (stat1[5] / stat2[5])) # > 1 = better time
# savings than expected, < 1 = worse time savings than expected
all.equal(as.numeric(kd1), as.numeric(kd2))
max(abs(kd1 - kd2)) # Should be around machine epsilon or less
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.