kernelMixedSmooth | R Documentation |
Smoothing with conditioning on discrete and continuous variables
kernelMixedSmooth(
x,
y,
by,
xout = NULL,
byout = NULL,
weights = NULL,
parallel = FALSE,
cores = 1,
preschedule = TRUE,
...
)
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
A numeric vector of dependent variable values. |
by |
A variable containing unique identifiers of discrete categories. |
xout |
A vector or a matrix of data points with |
byout |
A variable containing unique identifiers of discrete categories
for the output grid (same points as |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
parallel |
Logical: if |
cores |
Integer: the number of CPU cores to use. High core count = high RAM usage.
If the number of unique values of 'by' is less than the number of cores requested,
then, only |
preschedule |
Logical: passed as |
... |
Passed to |
A numeric vector of the kernel estimate of the same length as nrow(xout)
.
set.seed(1)
n <- 1000
z1 <- rbinom(n, 1, 0.5)
z2 <- rbinom(n, 1, 0.5)
x <- rnorm(n)
u <- rnorm(n)
y <- 1 + x^2 + z1 + 2*z2 + z1*z2 + u
by <- as.integer(interaction(list(z1, z2)))
out <- expand.grid(x = seq(-4, 4, 0.25), by = 1:4)
yhat <- kernelMixedSmooth(x = x, y = y, by = by, bw = 1, degree = 1,
xout = out$x, byout = out$by)
plot(x, y)
for (i in 1:4) lines(out$x[out$by == i], yhat[out$by == i], col = i+1, lwd = 2)
legend("top", c("00", "10", "01", "11"), col = 2:5, lwd = 2)
# The function works faster if there are duplicated values of the
# conditioning variables in the prediction grid and there are many
# observations; this is illustrated by the following example
# without a custom grid
# In this example, ignore the fact that the conditioning variable is rounded
# and therefore contains measurement error (ruining consistency)
x <- rnorm(10000)
xout <- rnorm(5000)
xr <- round(x)
xrout <- round(xout)
w <- runif(10000, 1, 3)
y <- 1 + x^2 + rnorm(10000)
by <- rep(1:4, each = 2500)
byout <- rep(1:4, each = 1250)
system.time(kernelMixedSmooth(x = x, y = y, by = by, weights = w,
xout = xout, byout = byout, bw = 1))
# user system elapsed
# 0.144 0.000 0.144
system.time(km1 <- kernelMixedSmooth(x = xr, y = y, by = by, weights = w,
xout = xrout, byout = byout, bw = 1))
# user system elapsed
# 0.021 0.000 0.022
system.time(km2 <- kernelMixedSmooth(x = xr, y = y, by = by, weights = w,
xout = xrout, byout = byout, bw = 1, no.dedup = TRUE))
# user system elapsed
# 0.138 0.001 0.137
all.equal(km1, km2)
# Parallel capabilities shine in large data sets
if (.Platform$OS.type != "windows") {
# A function to carry out the same estimation in multiple cores
pFun <- function(n) kernelMixedSmooth(x = rep(x, 2), y = rep(y, 2),
weights = rep(w, 2), by = rep(by, 2),
bw = 1, degree = 0, parallel = TRUE, cores = n)
system.time(pFun(1)) # 0.6--0.7 s
system.time(pFun(2)) # 0.4--0.5 s
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.