reg_deconvolve: Compute the measurement error version of the Nadaraya-Watson...
In TimothyHyndman/deconvolve: Deconvolution Tools for Measurement Error Problems

Description Usage Arguments Details Value Warnings References Author(s) Examples

Estimates m(x) = E[Y | X = x] from data (W, Y) where W = X + U.

reg_deconvolve(Y, W1, W2 = NULL, xx = seq(min(W1), max(W1), length.out
  = 100), errortype = NULL, sd_U = NULL, phiU = NULL, bw = NULL,
  rho = NULL, n_cores = NULL, kernel_type = c("default", "normal",
  "sinc"), seed = NULL, use_alt_SIMEX_rep_opt = FALSE)

`Y`	A vector of the response data Y_1, ..., Y_n.
`W1`	A vector of size n containing the univariate contaminated data.
`W2`	(optional) A vector of size n containing replicate measurements for the same n individuals (in the same order) as W1. If supplied, then the error distribution will be estimated using the replicates only if `phiU`, and both of `errortype` and `sd_U` are not provided.
`xx`	A vector of x values on which to compute the regression estimator.
`errortype`	A single string giving the distribution of U, either "laplace" or "normal". If you define the error distribution this way then you must also provide `sd_U` but should not provide `phiU`. Argument is case-insensitive and partially matched.
`sd_U`	The standard deviation of U. This does not need to be provided if you define your error using phiU and provide `bw` and `rho`.
`phiU`	A function giving the characteristic function of U. You should only define the errors this way if you also provide `bw` and `rho`. If you define the errors this way then you should not provide `errortype`.
`bw`	The bandwidth to use. If you provide this then you should also provide `rho`.
`rho`	The ridge parameter to use. If you provide this then you should also provide `bw`.
`n_cores`	Number of cores to use when calculating the bandwidth. If `NULL`, the number of cores to use will be automatically detected.
`kernel_type`	The deconvolution kernel to use. The default kernel has characteristic function (1-t^2)^3 for t \in [-1,1]. The normal kernel is the standard normal density. The sinc kernel has characteristic function equal to 1 for t \in [-1,1]
`seed`	Set seed for SIMEX. Allows for reproducible results using SIMEX. Otherwise a default seed will be automatically set.
`use_alt_SIMEX_rep_opt`	Only used with SIMEX using replicates. If `TRUE`, performs SIMEX on W = (W1 + W2)/2 and samples U* from (W1 - W2). The default performs SIMEX on W = (W1, W2) and and samples U* from (W1 - W2)/√ 2.

#' The function reg_deconvolve chooses from one of two different methods depending on how the error distribution is defined.

Error from Replicates: If both W1 and W2 are supplied then the error is calculated using replicates. This method was prototyped in Delaigle, Hall, and Meister 2008 and then further refined in Delaigle and Hall 2016, and Camirand, Carroll, and Delaigle 2018.

Homoscedastic Error: If the errors are defined by either a single function phiU, or a single value sd_U along with its errortype then the method used is as described in Fan and Truong 1993.

The order in which we choose the methods is as follows:

If provided, use phiU to define the errors, otherwise
If provided use errortype and sd_u to define the errors, otherwise
If provided, use the vector of replicates W2 to estimate the error distribution.

Note that in both 1 and 2, if a vector of replicates W2 is provided we augment the data in W1 with that in W2.

An object of class deconvolve containing the regression estimator, as well as the bandwidth and ridge parameter rho. Using SIMEX to choose smoothing-parameters. See Delaigle and Hall 2008.

If provided, the bandwidth h and ridge parameter rho need to be consistent. You should either provide both or neither.
The estimator can also be computed using the Fast Fourier Transform, which is faster, but more complex. See Delaigle and Gijbels 2007.

Camirand, F., Carroll, R.J., and Delaigle, A. (2018). Estimating the distribution of episodically consumed food measured with errors. Manuscript.

Delaigle, A. and Gijbels, I. (2007). Frequent problems in calculating integrals and optimizing objective functions: a case study in density deconvolution. Statistics and Computing, 17, 349 - 355.

Delaigle, A. and Hall, P. (2008). Using SIMEX for smoothing-parameter choice in errors-in-variables problems. Journal of the American Statistical Association, 103, 481, 280-287

Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 1, 231-252.

Delaigle, A., Hall, P., and Meister, A. (2008). On Deconvolution with repeated measurements. Annals of Statistics, 36, 665-685

Fan, J., and Truong, Y. K. (1993), Nonparametric Regression With Errors in Variables, The Annals of Statistics. 21, 1900-1925.

Aurore Delaigle, Timothy Hyndman, Tianying Wang

## Not run: 
# Error from replicates --------------------------------------------------------
W1 <- (framingham$SBP21 + framingham$SBP22)/2
W2 <- (framingham$SBP31 + framingham$SBP32)/2
Y <- framingham$FIRSTCHD
h <- 1.120537 #Precalculated using SIMEX option from bandwidth()
rho <- 0.0103959 #Precalculated using SIMEX option from bandwidth()
output <- reg_deconvolve(Y, W1, W2, bw = h, rho = rho)

# Error known ------------------------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)
Y <- 2*X

sd_U = 0.2
U <- stats::rnorm(n, sd = sd_U)

W <- X + U

output <- reg_deconvolve(W, Y, errortype = "norm", sd_U = 0.2, n_cores = 2)

## End(Not run)