knitr::opts_chunk$set(global.par=TRUE, collapse=TRUE, comment="#>", fig.width=5, fig.height=5, fig.align="center", dpi=96) options(tibble.print_min=3L, tibble.print_max=3L)
Kernel density estimation is a popular tool for visualising the distribution of data. See Chacon \& Duong (2018), for example, for an overview. When multivariate kernel density estimation is considered it is usually in the constrained context with diagonal bandwidth matrices, e.g. in the R packages sm and KernSmooth. In contrast, the ks package implements both diagonal and unconstrained data-driven bandwidth matrices for kernel density estimation.
For our target density, we use the `dumbbell' density, given by the normal mixture $$ \frac{4}{11} N \bigg( \begin{bmatrix}-2 \ 2\end{bmatrix}, \begin{bmatrix}1 & 0 \ 0 & 1 \end{bmatrix} \bigg)+ \frac{3}{11} N \bigg( \begin{bmatrix}0 \ 0\end{bmatrix}, \begin{bmatrix}0.8 & -0.72 \ -0.72 & 0.8\end{bmatrix} \bigg)+ \frac{4}{11} N \bigg( \begin{bmatrix}2 \ -2\end{bmatrix}, \begin{bmatrix}1 & 0 \ 0 & 1 \end{bmatrix} \bigg). $$ This density is unimodal and we draw a sample of 2000 data points.
library(ks) mus <- rbind(c(-2,2), c(0,0), c(2,-2)) Sigmas <- rbind(diag(2), matrix(c(0.8, -0.72, -0.72, 0.8), nrow=2), diag(2)) cwt <- 3/11 props <- c((1-cwt)/2, cwt, (1-cwt)/2) plotmixt(mus=mus, Sigmas=Sigmas, props=props, display="filled.contour", xlim=c(-4,4), ylim=c(-4,4)) set.seed(88192) samp <- 2000 x <- rmvnorm.mixt(n=samp, mus=mus, Sigmas=Sigmas, props=props) colnames(x) <- c("x","y") plot(x, pch=16, cex=0.5, xlim=c(-4,4), ylim=c(-4,4))
We use Hpi for unconstrained plug-in selectors and Hpi.diag for diagonal plug-in selectors.
Hpi1 <- Hpi(x=x) Hpi2 <- Hpi.diag(x=x)
To compute a kernel density estimate, the command is kde, which creates a kde class object
fhat.pi1 <- kde(x=x) ## same as Hpi(x=x, H=Hpi1) fhat.pi2 <- kde(x=x, H=Hpi2)
We use the plot method for kde objects to display these kernel density estimates. The default is a contour plot with the upper 25%, 50% and 75% contours of the
(sample) highest density regions. The diagonal bandwidth matrix constrains the smoothing to be performed in directions parallel to the co-ordinate axes, so it is not able to apply accurate levels of smoothing to the obliquely oriented central portion. The result is a multimodal density estimate. The unconstrained bandwidth matrix correctly produces a unimodal density estimate.
plot(fhat.pi1, main="Plug-in", display="filled.contour", xlim=c(-4,4), ylim=c(-4,4)) plot(fhat.pi2, main="Plug-in diagonal", display="filled.contour", xlim=c(-4,4), ylim=c(-4,4))
The unconstrained SCV (Smoothed Cross Validation) selector is Hscv and
its diagonal version is Hscv.diag. The most reasonable density estimate below is from the unconstrained SCV selector.
Hscv1 <- Hscv(x=x) Hscv2 <- Hscv.diag(x=x) fhat.cv1 <- kde(x=x, H=Hscv1) fhat.cv2 <- kde(x=x, H=Hscv2) plot(fhat.cv1, main="SCV", display="filled.contour", xlim=c(-4,4), ylim=c(-4,4)) plot(fhat.cv2, main="SCV diagonal", display="filled.contour", xlim=c(-4,4), ylim=c(-4,4))
The unconstrained bandwidth selectors will be better than their diagonal counterparts when the data have large mass oriented obliquely to the co-ordinate axes, like for the dumbbell data. The unconstrained plug-in and the SCV selectors can be viewed as generally recommended selectors.
Chacon, J. E. and Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications Chapman & Hall/CRC Press, Boca Raton.
Duong, T. 2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R Journal of Statistical Software 21 (7).
-- Generated r format(Sys.time(), '%d %B %Y').
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.