bandwidth: Bandwidth Selectors for Deconvolution Kernel Density...

Description Usage Arguments Details Value References Author(s) Examples

Description

Computes a bandwidth for the deconvolution kernel estimator of the density of X from data W_{i1} = X_i + U_{i1}, i=1,...,n when the distribution of U_i is known, unknown, or estimated from replicates, W_{i1} = X_i + U_{i1} and W_{i2} = X_i + U_{i2}. If 'SIMEX' algorithm used, computes a bandwidth for use in deconvolution regression of data (W_i, Y_i) where Y_i = g(X_i) + V_i and W_i = X_i + U_i.

Usage

1
2
3
4
bandwidth(W1, W2 = NULL, errortype = NULL, sd_U = NULL,
  phiU = NULL, Y = NULL, algorithm = c("PI", "CV", "SIMEX"),
  n_cores = NULL, kernel_type = c("default", "normal", "sinc"),
  seed = NULL, use_alt_SIMEX_rep_opt = FALSE, het_replicates = FALSE)

Arguments

W1

A vector of size n containing the univariate contaminated data.

W2

(optional) A vector of size n containing replicate measurements for the same n individuals (in the same order) as W1. If supplied, then the error distribution will be estimated using the replicates only if phiU, and both of errortype and sd_U are not provided.

errortype

A single string giving the distribution of U, either "laplace" or "normal". If you define the error distribution this way then you must also provide sd_U but should not provide phiU. Argument is case-insensitive and partially matched.

sd_U

The standard deviations of U. A single value for homoscedastic errors and a vector having the same length as W1 for heteroscedastic errors.

phiU

Function(s) giving the characteristic function of the errors. A single real valued function for homoscedastic errors and a vector of n real valued functions for heteroscedastic errors. If you define the errors this way then you should not provide errortype.

Y

A vector of the univariate dependent data (used in the regression setting). Only required for 'SIMEX' algorithm.

algorithm

One of "PI" for plug-in bandwidth, "CV" for cross-validation bandwidth or "SIMEX". If "CV" then the errors must be homoscedastic. "PI" can only be used if your kernel has finite second order moment and finite squared integral.

n_cores

Number of cores to use when using SIMEX algorithm. If NULL, the number of cores to use will be automatically detected.

kernel_type

A string giving the kernel K to use when computing the estimator of the density of X. Either "default", "normal", or "sinc". The default kernel has characteristic function (1-t^2)^3 for t \in [-1,1]. The normal kernel is the standard normal density. The sinc kernel has characteristic function equal to 1 for t \in [-1,1]

seed

Set seed for SIMEX. Allows for reproducible results using SIMEX.

use_alt_SIMEX_rep_opt

Only used with SIMEX based on replicates. If TRUE, performs SIMEX on W = (W1 + W2)/2 and samples U* from (W1 - W2). The default performs SIMEX on W = (W1, W2) and and samples U* from (W1 - W2)/√ 2.

het_replicates

If TRUE, then the errors are not assumed to be homoscedastic and the code for heteroscedastic errors is used. Only applicable if W2 is supplied.

Details

The function bandwidth chooses from one of seven different methods depending on how the error distribution is defined/computed and which algorithm is selected.

PI for known homoscedastic error distribution: If algorithm = "PI" and the error distribution is defined by either a single function phiU, or a single value sd_U along with its errortype, then the method used is as described in Delaigle and Gijbels (2002) and Delaigle and Gijbels (2004).

PI for known heteroscedastic error distributions: If algorithm = "PI" and the error distributions are defined by a either a vector of functions phiU, or a vector sd_U along with its (unique) errortype then the method used is as described in Delaigle and Meister (2008).

PI for unknown homoscedastic error distribution estimated from replicates: If algorithm = "PI" and a replicate vector W2 is supplied and the errors are assumed to be homoscedastic, then the error distribution is estimated using replicates as described in the main deconvolution code and then the PI bandwidth for known errors is used, with the true error distribution replaced by its estimator.

PI for unknown heteroscedastic error distribution estimated from replicates: If algorithm = "PI" and a replicate vector W2 is supplied and the errors are heteroscedastic, then the density of X is estimated as described in Delaigle and Meister (2008) with adjustments described in deconvolve and the PI bandwidth uses arguments similars to the ones used for the PI bandwidth for known errors, described in Delaigle and Meister (2008), is used, but adapated to the unknown error case.

PI for unknown homoscedastic error distribution estimated without replicates: If algorithm = "PI" and the errors are not supplied, then the error distribution is estimated using the method described in Delaigle and Hall (2016) and then the bandwidth is calculated using the method described in Delaigle and Gijbels (2002) and Delaigle and Gijbels (2004) except that the error distribution is replaced by its estimator.

CV: If algorithm = "CV" then the method used is the corss-validation described in Stefanski and Carroll (1990) and Delaigle and Gijbels (2004).

SIMEX: If algorithm = "SIMEX" then the method used is the SIMEX procedure described in Delaigle and Hall (2008).

SIMEX for Replicates: If algorithm = "SIMEX" and a replicate vector W2 is supplied, then phi_U is calculated using replicates and SIMEX is performed as according to use_alt_SIMEX_rep_opt.

The order in which we choose the methods is as follows:

  1. If provided, use phiU to define the errors, otherwise

  2. If provided use errortype and sd_u to define the errors, otherwise

  3. If provided, use the vector of replicates W2 to estimate the error distribution, otherwise

  4. We use the method for unknown homoscedastic error distribution estimated without replicates.

Note that in both 1 and 2, if a vector of replicates W2 is provided we augment the data in W1 with that in W2.

Value

A data-driven bandwidth. If using 'SIMEX' algorithm then returns a list containing the bandwidth 'h' and ridge parameter 'rho'.

References

Delaigle, A. and Gijbels, I. (2002). Estimation of integrated squared density derivatives from a contaminated sample. Journal of the Royal Statistical Society, B, 64, 4, 869-886.

Delaigle, A. and Gijbels, I. (2004). Practical bandwidth selection in deconvolution kernel density estimation. Computational Statistics and Data Analysis, 45, 2, 249 - 267.

Delaigle, A. and Hall, P. (2008). Using SIMEX for smoothing-parameter choice in errors-in-variables problems. Journal of the American Statistical Association, 103, 481, 280-287

Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 1, 231-252.

Delaigle, A., Hall, P., and Meister, A. (2008). On Deconvolution with repeated measurements. Annals of Statistics, 36, 665-685

Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli, 14, 2, 562-579.

Stefanski, L. and Carroll, R.J. (1990). Deconvoluting kernel density estimators. Statistics, 21, 2, 169-184.

Author(s)

Aurore Delaigle, Timothy Hyndman, Tianying Wang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
## Not run: 
# PI bandwidth with error estimated from replicates ----------------------------
W1 <- (framingham$SBP21 + framingham$SBP22)/2
W2 <- (framingham$SBP31 + framingham$SBP32)/2

bw <- bandwidth(W1, W2)


# Generate homoscedastic data --------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)

sd_U = 0.2
U <- stats::rnorm(n, sd = sd_U)

W <- X + U

# CV bandwidth -----------------------------------------------------------------
bw <- bandwidth(W, errortype = "norm", sd_U = sd_U, algorithm = "CV")

# SIMEX bandwidth --------------------------------------------------------------
Y <- 2*X

output <- bandwidth(W, errortype = "norm", sd_U = sd_U, Y = Y, 
					algorithm = "SIMEX", n_cores = 2)
bw <- output$h
rho <- output$rho

# Generate heteroscedastic data ------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)

sd_U_vec <- 0.6 * sqrt(1 + (1:n) / n) * sqrt(0.5)
U <- c()
for (sigUk in sd_U_vec){
	U <- c(U, stats::rnorm(1, 0, sigUk))
}

W <- X + U

# PI bandwidth with heteroscedastic errors -------------------------------------
bw <- bandwidth(W, errortype = "norm", sd_U = sd_U_vec)

# PI bandwidth with heteroscedastic errors supplied using phiU -----------------
phiU <- c()
for (sigUk in sd_U_vec){
	phiUk <- function(tt) {
		exp(-sigUk^2 * tt^2 / 2)
	}
	phiU <- c(phiU, phiUk)
}

bw <- bandwidth(W, sd_U = sd_U_vec, phiU = phiU)

## End(Not run)

TimothyHyndman/deconvolve documentation built on May 13, 2019, 11:51 p.m.