In gbasulto/kerdec: Kernel deconvolution methods

rm(list=ls()) ### To clear namespace
library(knitr)

# opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
#                echo=TRUE, warning=FALSE, message=FALSE)
opts_chunk$set(fig.width= 7, fig.height= 5,
               echo=TRUE, warning=FALSE, message=FALSE,
               eval=FALSE)

Kernel Choice

In kernel deconvolution density estimators (KDDEs), the kernel function is not used directly but, rather, its Fourier transform. Therefore, it is desirable to have kernels whose Fourier transform have good properties.

Selecting kernels whose Fourier transform has a bounded support leads to numerical advantages (as it is easier to integrate numerically) as well as theoretical advantages (as it is easier to deal with convergence rates).

In this vignette we find:

Description of the kernels available in kerdec package.
Product kernels used in this package for higher dimensions integration.
Kernel selection in KDDE

Kernel functions and their Fourier transforms

The explicit expressions of kernel functions are not required in kernel deconvolution methods, but rather their Fourier transform. This package provides formulas for those Fourier transforms. We show below how they look like and then we show how do kernel functions look like, but we do use the package fourierin to obtain these values from their Fourier transforms.

The five kernel functions considered have boundedly supported Fourier transforms, specifically on $[-1, 1]$. We abuse on the nomenclature of the kernels, since "triangular", "tricube" and "triweight" do not correspond to the kernel functions with these names, but to the fact that their Fourier transform is proportional to these kernels. In the same manner, "flat-top" refers to a kernel whose Fourier transform is flat around zero.

## Load packages
library(kerdec)
## Packages used to create the dataframe and plot it:
library(dplyr)                          
library(tidyr)
library(ggplot2)

## Set grid where kernels will be evaluated and list all the possible
## kernel choices.
t_grid <- seq(-1.3, 1.3, by = 0.01)
kernels <- c("sinc", "triangular", "triweight", "tricube", "flat-top")

## Data frame with kernel, t and value of FT
FT_kernel_values <- 
    crossing(t = t_grid,
             kernel = kernels) %>%
    group_by(kernel) %>%
    mutate(value = ft_kernel(t, unique(kernel))) 

## Plot values of FT of kernels
FT_kernel_values %>%
    ggplot(aes(t, value, color = kernel)) +
    geom_line(aes(linetype = kernel)) +
    theme_bw() +
    ggtitle("Fourier transform of kernels in 'kerdec'") +
    theme(legend.position = "bottom")

Now we obtain the values of the kernel functions from their Fourier transforms.

## Load packages

library(kerdec)
## Package to recover the kernels based on their Fourier tranform.
library(fourierin)
## Packages used to create dataframe and then generate plot:
library(dplyr)
library(purrr)
library(ggplot2)

## List kernel and define function to recover kernels from their
## Fourier transform and puts them in a dataframe.
kernels <- c("sinc", "triangular", "triweight", "tricube", "flat-top")
recover_kernel <- function (ker) {
    fourierin(function (t) ft_kernel(t, ker), 
              lower_int = -1, upper_int = 1, 
              lower_eval = -9, upper_eval = 9, 
              const_adj = -1, freq_adj = -1, 
              resolution = 256)    %>%
        with(data_frame(x =  w,
                        value = Re(c(values)),
                        kernel = ker))
}

## Data frame with kernel, t and kernel values.
kernel_values <- 
    kernels %>%
    structure(., names = .)  %>%
    map_df(recover_kernel)


## Plot possible kernels
kernel_values %>%
    ggplot(aes(x, value, color = kernel)) +
    geom_line(aes(linetype = kernel)) +
    theme_bw() +
    ggtitle("Actual kernel values") + 
    theme(legend.position = "bottom")

Univariate example

Although in the previous section we showed how to use the function that provides the Fourier transform of a particular set of kernel functions, we show a bare minimum example to illustrate how to use these function.

## Load package 
library(fourierin)
library(kerdec)
library(dplyr)
library(ggplot2)

## Plot triangular kernel
data_frame(t = seq(-1.3, 1.3, 0.01),
           values = ft_kernel(t, ker = "triangular")) %>%
  ggplot(aes(t, values)) +
  geom_line() +
  ggtitle("Triangular kernel") +
  theme_bw()

Product kernels

Finally we show a minimal example of how to use the function ft_kernel to find the Fourier transform of a product kernel. Observe that the grid has to be introduced as a matrix with $d$ columns and not as a $d$-dimensional array.

## Load package 
library(fourierin)
library(dplyr)
library(lattice)



## Define grid
t1 <- seq(-1.2, 1.2, length.out = 64)
t2 <- seq(-1.2, 1.2, length.out = 64)

## Compute the values of the kernel
FT_kernel_values <- 
    crossing(t1, t2) %>%
    mutate(value = ft_kernel(cbind(t1, t2), ker = "flat-top"))

## 
wireframe(value ~ t1*t2, data = FT_kernel_values,
  xlab = expression(t[1]), ylab = expression(t[2]),
  main = "Product kernel",
  drape = TRUE,
  colorkey = TRUE,
  pretty =  TRUE,
  col.regions = colorRampPalette(c("yellow", "red"))(100),
  screen = list(z = -60, x = -60)
)

FT_kernel_values %>%
    ggplot(aes(t1, t2, z = value)) +
    geom_contour(aes(color = ..level..)) +
    theme_bw() +
    xlab(expression(t[1])) +
    ylab(expression(t[2]))

Usage in KDDE

library(kerdec)                         # Load package
library(dplyr)
library(ggplot2)

set.seed(666)                           # Set seed


## Signal parameters 
shape <- 5                              # X distr. shape par.
rate <- 5                               # X distr. rate par.

## Settings and samples for all the cases
n <- 250                                # Sample size
sd_error <- 0.2                         # std. error of error distr.
X <- rgamma(n, shape, rate)             # Uncontaminated sample
Y <- X + rlaplace(n, sd = sd_error)     # Contaminated sample
extra_errors <- rlaplace(2*n, sd = sd_error) # Extra errors


## Estimate density
out <- kerdec_dens(smp = Y, error_smp = extra_errors, h = 0.2, kernel = "sinc")

str(out)

data_frame(x = out$x_eval, value = out$f_vals) %>%
    qplot(x, value, data = ., geom = "line")