# deconvolve: Deconvolution Kernel Density Estimator In TimothyHyndman/deconvolve: Deconvolution Tools for Measurement Error Problems

## Description

Computes the deconvolution kernel estimator (KDE) of the density of X from data W_{i1} = X_i + U_{i1}, i=1,...,n when the distribution of U_i is known or unknown and estimated from replicates, W_{i1} = X_i + U_{i1} and W_{i2} = X_i + U_{i2}, or without replicates if replicates are not available. All error densities need to be symmetric. In the homoscedastic error case, the codes are suitable only if the characteristic function of the errors is nonzero everywhere. In the heteroscedastic error case, the pooled characteristic function of the errors used by Delaigle and Meister (2006) must be nonzero everywhere

## Usage

 1 2 3 4 deconvolve(W1, W2 = NULL, xx = seq(min(W1), max(W1), length.out = 100), errortype = NULL, sd_U = NULL, phiU = NULL, bw = NULL, rescale = FALSE, kernel_type = c("default", "normal", "sinc"), het_replicates = FALSE, m = 20, show_diagnostics = FALSE) 

## Arguments

 W1 A vector of size n containing the univariate contaminated data. W2 (optional) A vector of size n containing replicate measurements for the same n individuals (in the same order) as W1. If supplied, then the error distribution will be estimated using the replicates only if phiU, and both of errortype and sd_U are not provided. xx A vector of x values on which to compute the estimator of the density of X. errortype A single string giving the distribution of U, either "laplace" or "normal". If you define the error distribution this way then you must also provide sd_U but should not provide phiU. Argument is case-insensitive and partially matched. sd_U The standard deviation(s) of U: a single value for homoscedastic errors and a vector of length n for heteroscedastic errors. This does not need to be provided if you define your error distribution usingphiU and provide bw. phiU Function(s) giving the characteristic function of the errors. A single real valued function for homoscedastic errors and a vector of n real valued functions for heteroscedastic errors. If you define the errors this way then you should not provide errortype. bw The bandwidth to use when computing the kernel estimator of the density of X. If NULL, a bandwidth will be calculated using a plug-in estimator. rescale If TRUE, the estimator of the density of X is rescaled so that it integrates to 1. Rescaling requires xx to be a fine grid of equispaced x values that cover the whole range of x-values where the estimated density is significantly non zero. kernel_type A string giving the kernel K to use when computing the estimator of the density of X. Either "default", "normal", or "sinc". The default kernel has characteristic function (1-t^2)^3 for t \in [-1,1]. The normal kernel is the standard normal density. The sinc kernel has characteristic function equal to 1 for t \in [-1,1] het_replicates If TRUE, then the errors are not assumed to be homoscedastic and the code for heteroscedastic errors is used. Only applicable if W2 is supplied. m The number of point masses to use to estimate the distribution of X in the case of an unknown homoscedastic error distribution estimated without replicates. show_diagnostics If TRUE, then diagnostic messages are printed displaying the results of the various optimizations performed when the error distribution is not supplied and estimated by the method in Delaigle and Hall (2016). Intended to be used for developement only.

## Details

The function deconvolve chooses from one of five different methods depending on how the error distribution is defined/computed:

Known homoscedastic error distribution: If the error distribution is defined by either a single function phiU, or a single value sd_U along with its errortype then the estimator of the density of X is the one in Stefanski and Carroll (1990).

Known heteroscedastic error distributions: If the error distributions are defined by a either a vector of functions phiU, or a vector sd_U along with their errortype (one single error type for all individuals) then the method used is the one from Delaigle and Meister (2008).

Unknown homoscedastic error distribution when replicates are available: If both W1 and W2 are supplied and het_replicates is FALSE, then the error distribution is estimated using the replicates as in Delaigle, Hall and Meister (2008) and the estimator of the density of X is computed as in Delaigle, Hall and Meister (2008) except that the interval A at page 678 of that paper is replaced by the one used by Camirand, Carroll and Delaigle (2018), which is a refined version of the one used by Delaigle and Hall (2016).

Unknown heteroscedastic error distribution when replicates are available: If both W1 and W2 are supplied and het_replicates is TRUE, then these distributions are estimated using replicates, as in Delaigle and Meister (2008) and the estimator of the density of X is computed as in Delaigle and Meister (2008) except that we do the tail correction of the estimated pooled characteristic function of the errors as in Delaigle and Hall (2016) and Camirand, Carroll and Delaigle (2018).

Unknown homoscedastic error distribution estimated without replicates: If none of phiU, errortype and sd_U, or W2 are supplied then the density of X is assumed asymmetric and to satisfy the identifiability conditions of Delaigle and Hall (2016). The estimator of the density of X is computed as in Delaigle and Hall (2016) except that we use far fewer points of support for the estimating discrete distribution and we allow the location of these point masses to vary.

The order in which we choose the methods is as follows:

1. If provided, use phiU to define the errors, otherwise

2. If provided use errortype and sd_u to define the errors, otherwise

3. If provided, use the vector of replicates W2 to estimate the error distribution, otherwise

4. We use the method for unknown homoscedastic error distribution estimated without replicates.

Note that in both 1 and 2, if a vector of replicates W2 is provided we augment the data in W1 with that in W2.

## Value

An object of class "deconvolve".

The function plot produces a plot of the deconvolution KDE of the density of X on the grid xx.

An object of class "deconvolve" is a list containing at least the following elements:

 W1 The original vector of contaminated data x The values on which the deconvolution KDE is evaluated pdf A vector containing the deconvolution KDE of the density of X, evaluated at each point in x bw The bandwidth used to calculate the KDE

It may also contain

 W2 The original vector of replicates

## Warnings

• If your errors have a Laplace distribution, sd_U is not the usual parameter of a Laplace distribution but its standard deviation.

• The method for deconvolution when the error distribution is unknown and assumed symmetric, and estimated without replicates, as in Delaigle and Hall (2016), requires solving multiple non-linear optimizations with both linear and non-linear constraints. The current implementation can be slow and unreliable. An alternative MATLAB implementation can be found at <github.com/TimothyHyndman/deconvolve-supp> which may work better in some circumstances.

• If you supply your own bandwidth, then you should ensure that the kernel used here matches the one you used to calculate your bandwidth.

• The DKDE can also be computed using the Fast Fourier Transform, which is a bit more complex. See Delaigle and Gijbels (2007). However if the grid of t-values is fine enough, the estimator can simply be computed like here without having problems with oscillations.

## References

Camirand, F., Carroll, R.J. and Delaigle, A. (2018). Estimating the distribution of episodically consumed food measured with errors. Delaigle, A. and Gijbels, I. (2007). Frequent problems in calculating integrals and optimizing objective functions: a case study in density deconvolution. Statistics and Computing, 17, 349-355.

Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 1, 231-252.

Delaigle, A., Hall, P. and Meister, A. (2008). On Deconvolution with repeated measurements. Annals of Statistics, 36, 665-685

Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli, 14, 2, 562-579.

Stefanski, L.A. and Carroll, R.J. (1990). Deconvolving kernel density estimators. Statistics, 21, 2, 169-184. Manuscript.

## Author(s)

Aurore Delaigle, Timothy Hyndman, Tianying Wang

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 ## Not run: # Error estimated from replicates ---------------------------------------------- W1 <- (framingham$SBP21 + framingham$SBP22)/2 W2 <- (framingham$SBP31 + framingham$SBP32)/2 yy <- deconvolve(W1, W2) # Symmetric Errors ------------------------------------------------------------- output <- deconvolve((framingham$SBP21 + framingham$SBP22)/2) # Generate homoscedastic data -------------------------------------------------- n <- 50 X <- stats::rchisq(n, 3) sd_U = 0.2 U <- stats::rnorm(n, sd = sd_U) W <- X + U # Homoscedastic Errors --------------------------------------------------------- yy <- deconvolve(W, errortype = "norm", sd_U = sd_U) # Generate heteroscedastic data ------------------------------------------------ n <- 50 X <- stats::rchisq(n, 3) sd_U_vec <- 0.6 * sqrt(1 + (1:n) / n) * sqrt(0.5) U <- c() for (sigUk in sd_U_vec){ U <- c(U, stats::rnorm(1, 0, sigUk)) } W <- X + U # Heteroscedastic Errors ------------------------------------------------------- yy <- deconvolve(W, errortype = "norm", sd_U = sd_U_vec) # Heteroscedastic Errors provided using a vector of phiUs ---------------------- phiU <- c() for (sigUk in sd_U_vec){ phiUk <- function(tt) { exp(-sigUk^2 * tt^2 / 2) } phiU <- c(phiU, phiUk) } yy <- deconvolve(W, sd_U = sd_U_vec, phiU = phiU) ## End(Not run) 

TimothyHyndman/deconvolve documentation built on May 13, 2019, 11:51 p.m.