deconvolve: Deconvolution Kernel Density Estimator

Description Usage Arguments Details Value Warnings References Author(s) Examples

Description

Computes the deconvolution kernel estimator (KDE) of the density of X from data W_{i1} = X_i + U_{i1}, i=1,...,n when the distribution of U_i is known or unknown and estimated from replicates, W_{i1} = X_i + U_{i1} and W_{i2} = X_i + U_{i2}, or without replicates if replicates are not available. All error densities need to be symmetric. In the homoscedastic error case, the codes are suitable only if the characteristic function of the errors is nonzero everywhere. In the heteroscedastic error case, the pooled characteristic function of the errors used by Delaigle and Meister (2006) must be nonzero everywhere

Usage

1
2
3
4
deconvolve(W1, W2 = NULL, xx = seq(min(W1), max(W1), length.out = 100),
  errortype = NULL, sd_U = NULL, phiU = NULL, bw = NULL,
  rescale = FALSE, kernel_type = c("default", "normal", "sinc"),
  het_replicates = FALSE, m = 20, show_diagnostics = FALSE)

Arguments

W1

A vector of size n containing the univariate contaminated data.

W2

(optional) A vector of size n containing replicate measurements for the same n individuals (in the same order) as W1. If supplied, then the error distribution will be estimated using the replicates only if phiU, and both of errortype and sd_U are not provided.

xx

A vector of x values on which to compute the estimator of the density of X.

errortype

A single string giving the distribution of U, either "laplace" or "normal". If you define the error distribution this way then you must also provide sd_U but should not provide phiU. Argument is case-insensitive and partially matched.

sd_U

The standard deviation(s) of U: a single value for homoscedastic errors and a vector of length n for heteroscedastic errors. This does not need to be provided if you define your error distribution usingphiU and provide bw.

phiU

Function(s) giving the characteristic function of the errors. A single real valued function for homoscedastic errors and a vector of n real valued functions for heteroscedastic errors. If you define the errors this way then you should not provide errortype.

bw

The bandwidth to use when computing the kernel estimator of the density of X. If NULL, a bandwidth will be calculated using a plug-in estimator.

rescale

If TRUE, the estimator of the density of X is rescaled so that it integrates to 1. Rescaling requires xx to be a fine grid of equispaced x values that cover the whole range of x-values where the estimated density is significantly non zero.

kernel_type

A string giving the kernel K to use when computing the estimator of the density of X. Either "default", "normal", or "sinc". The default kernel has characteristic function (1-t^2)^3 for t \in [-1,1]. The normal kernel is the standard normal density. The sinc kernel has characteristic function equal to 1 for t \in [-1,1]

het_replicates

If TRUE, then the errors are not assumed to be homoscedastic and the code for heteroscedastic errors is used. Only applicable if W2 is supplied.

m

The number of point masses to use to estimate the distribution of X in the case of an unknown homoscedastic error distribution estimated without replicates.

show_diagnostics

If TRUE, then diagnostic messages are printed displaying the results of the various optimizations performed when the error distribution is not supplied and estimated by the method in Delaigle and Hall (2016). Intended to be used for developement only.

Details

The function deconvolve chooses from one of five different methods depending on how the error distribution is defined/computed:

Known homoscedastic error distribution: If the error distribution is defined by either a single function phiU, or a single value sd_U along with its errortype then the estimator of the density of X is the one in Stefanski and Carroll (1990).

Known heteroscedastic error distributions: If the error distributions are defined by a either a vector of functions phiU, or a vector sd_U along with their errortype (one single error type for all individuals) then the method used is the one from Delaigle and Meister (2008).

Unknown homoscedastic error distribution when replicates are available: If both W1 and W2 are supplied and het_replicates is FALSE, then the error distribution is estimated using the replicates as in Delaigle, Hall and Meister (2008) and the estimator of the density of X is computed as in Delaigle, Hall and Meister (2008) except that the interval A at page 678 of that paper is replaced by the one used by Camirand, Carroll and Delaigle (2018), which is a refined version of the one used by Delaigle and Hall (2016).

Unknown heteroscedastic error distribution when replicates are available: If both W1 and W2 are supplied and het_replicates is TRUE, then these distributions are estimated using replicates, as in Delaigle and Meister (2008) and the estimator of the density of X is computed as in Delaigle and Meister (2008) except that we do the tail correction of the estimated pooled characteristic function of the errors as in Delaigle and Hall (2016) and Camirand, Carroll and Delaigle (2018).

Unknown homoscedastic error distribution estimated without replicates: If none of phiU, errortype and sd_U, or W2 are supplied then the density of X is assumed asymmetric and to satisfy the identifiability conditions of Delaigle and Hall (2016). The estimator of the density of X is computed as in Delaigle and Hall (2016) except that we use far fewer points of support for the estimating discrete distribution and we allow the location of these point masses to vary.

The order in which we choose the methods is as follows:

  1. If provided, use phiU to define the errors, otherwise

  2. If provided use errortype and sd_u to define the errors, otherwise

  3. If provided, use the vector of replicates W2 to estimate the error distribution, otherwise

  4. We use the method for unknown homoscedastic error distribution estimated without replicates.

Note that in both 1 and 2, if a vector of replicates W2 is provided we augment the data in W1 with that in W2.

Value

An object of class "deconvolve".

The function plot produces a plot of the deconvolution KDE of the density of X on the grid xx.

An object of class "deconvolve" is a list containing at least the following elements:

W1

The original vector of contaminated data

x

The values on which the deconvolution KDE is evaluated

pdf

A vector containing the deconvolution KDE of the density of X, evaluated at each point in x

bw

The bandwidth used to calculate the KDE

It may also contain

W2

The original vector of replicates

Warnings

References

Camirand, F., Carroll, R.J. and Delaigle, A. (2018). Estimating the distribution of episodically consumed food measured with errors. Delaigle, A. and Gijbels, I. (2007). Frequent problems in calculating integrals and optimizing objective functions: a case study in density deconvolution. Statistics and Computing, 17, 349-355.

Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 1, 231-252.

Delaigle, A., Hall, P. and Meister, A. (2008). On Deconvolution with repeated measurements. Annals of Statistics, 36, 665-685

Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli, 14, 2, 562-579.

Stefanski, L.A. and Carroll, R.J. (1990). Deconvolving kernel density estimators. Statistics, 21, 2, 169-184. Manuscript.

Author(s)

Aurore Delaigle, Timothy Hyndman, Tianying Wang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
## Not run: 
# Error estimated from replicates ----------------------------------------------
W1 <- (framingham$SBP21 + framingham$SBP22)/2
W2 <- (framingham$SBP31 + framingham$SBP32)/2

yy <- deconvolve(W1, W2)

# Symmetric Errors -------------------------------------------------------------
output <- deconvolve((framingham$SBP21 + framingham$SBP22)/2)

# Generate homoscedastic data --------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)

sd_U = 0.2
U <- stats::rnorm(n, sd = sd_U)

W <- X + U

# Homoscedastic Errors ---------------------------------------------------------
yy <- deconvolve(W, errortype = "norm", sd_U = sd_U)

# Generate heteroscedastic data ------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)

sd_U_vec <- 0.6 * sqrt(1 + (1:n) / n) * sqrt(0.5)
U <- c()
for (sigUk in sd_U_vec){
	U <- c(U, stats::rnorm(1, 0, sigUk))
}

W <- X + U

# Heteroscedastic Errors -------------------------------------------------------
yy <- deconvolve(W, errortype = "norm", sd_U = sd_U_vec)

# Heteroscedastic Errors provided using a vector of phiUs ----------------------
phiU <- c()
for (sigUk in sd_U_vec){
	phiUk <- function(tt) {
		exp(-sigUk^2 * tt^2 / 2)
	}
	phiU <- c(phiU, phiUk)
}

yy <- deconvolve(W, sd_U = sd_U_vec, phiU = phiU)

## End(Not run)

TimothyHyndman/deconvolve documentation built on May 13, 2019, 11:51 p.m.