Description Usage Arguments Details Value Warnings References Author(s) Examples
Computes the deconvolution kernel estimator (KDE) of the density of X from data W_{i1} = X_i + U_{i1}, i=1,...,n when the distribution of U_i is known or unknown and estimated from replicates, W_{i1} = X_i + U_{i1} and W_{i2} = X_i + U_{i2}, or without replicates if replicates are not available. All error densities need to be symmetric. In the homoscedastic error case, the codes are suitable only if the characteristic function of the errors is nonzero everywhere. In the heteroscedastic error case, the pooled characteristic function of the errors used by Delaigle and Meister (2006) must be nonzero everywhere
1 2 3 4 |
W1 |
A vector of size n containing the univariate contaminated data. |
W2 |
(optional) A vector of size n containing replicate measurements for the same
n individuals (in the same order) as W1. If supplied, then the error distribution
will be estimated using the replicates only if |
xx |
A vector of x values on which to compute the estimator of the density of X. |
errortype |
A single string giving the distribution of U, either "laplace" or "normal".
If you define the error distribution this way then you must also provide
|
sd_U |
The standard deviation(s) of U: a single value for
homoscedastic errors and a vector of length n for heteroscedastic
errors. This does not need to be provided if you define your error distribution
using |
phiU |
Function(s) giving the characteristic function of the errors. A
single real valued function for homoscedastic errors and a vector of n real valued functions
for heteroscedastic errors. If you define the errors this way then you
should not provide |
bw |
The bandwidth to use when computing the kernel estimator of the density
of X. If |
rescale |
If |
kernel_type |
A string giving the kernel K to use when computing the estimator of the density of X. Either "default", "normal", or "sinc". The default kernel has characteristic function (1-t^2)^3 for t \in [-1,1]. The normal kernel is the standard normal density. The sinc kernel has characteristic function equal to 1 for t \in [-1,1] |
het_replicates |
If |
m |
The number of point masses to use to estimate the distribution of X in the case of an unknown homoscedastic error distribution estimated without replicates. |
show_diagnostics |
If |
The function deconvolve
chooses from one of five different methods
depending on how the error distribution is defined/computed:
Known homoscedastic error distribution: If the error distribution
is defined by either a single function phiU
, or a single value
sd_U
along with its errortype
then the estimator of the density
of X is the one in Stefanski and Carroll (1990).
Known heteroscedastic error distributions: If the error distributions are
defined by a either a vector of functions phiU
, or a vector
sd_U
along with their errortype
(one single error type for all individuals)
then the method used is the one from Delaigle and Meister (2008).
Unknown homoscedastic error distribution when replicates are available: If both
W1
and W2
are supplied and het_replicates
is FALSE
, then the
error distribution is estimated using the replicates as in Delaigle, Hall and Meister (2008)
and the estimator of the density of X is computed as in Delaigle, Hall and Meister (2008)
except that the interval A at page 678 of that paper is replaced by the one used by
Camirand, Carroll and Delaigle (2018), which is a refined version of the one used by
Delaigle and Hall (2016).
Unknown heteroscedastic error distribution when replicates are available: If both
W1
and W2
are supplied and het_replicates
is TRUE
, then these
distributions are estimated using replicates, as in Delaigle and Meister (2008)
and the estimator of the density of X is computed as in Delaigle and Meister (2008)
except that we do the tail correction of the estimated pooled characteristic function of
the errors as in Delaigle and Hall (2016) and Camirand, Carroll and Delaigle (2018).
Unknown homoscedastic error distribution estimated without replicates:
If none of phiU
, errortype
and sd_U
, or W2
are supplied then the density
of X is assumed asymmetric and to satisfy the identifiability conditions of
Delaigle and Hall (2016). The estimator of the density of X is computed as in
Delaigle and Hall (2016) except that we use far fewer points of support for
the estimating discrete distribution and we allow the location of these point masses to vary.
The order in which we choose the methods is as follows:
If provided, use phiU
to define the errors, otherwise
If provided use errortype
and sd_u
to define the errors, otherwise
If provided, use the vector of replicates W2
to estimate the error distribution, otherwise
We use the method for unknown homoscedastic error distribution estimated without replicates.
Note that in both 1 and 2, if a vector of replicates W2
is provided we
augment the data in W1
with that in W2
.
An object of class "deconvolve
".
The function plot
produces a plot of the deconvolution KDE of the density of X on the grid xx
.
An object of class "deconvolve
" is a list containing at least the
following elements:
W1 |
The original vector of contaminated data |
x |
The values on which the deconvolution KDE is evaluated |
pdf |
A vector containing the deconvolution KDE of the density of X,
evaluated at each point in |
bw |
The bandwidth used to calculate the KDE |
It may also contain
W2 |
The original vector of replicates |
If your errors have a Laplace distribution, sd_U
is not the usual parameter
of a Laplace distribution but its standard deviation.
The method for deconvolution when the error distribution is unknown and assumed symmetric, and estimated without replicates, as in Delaigle and Hall (2016), requires solving multiple non-linear optimizations with both linear and non-linear constraints. The current implementation can be slow and unreliable. An alternative MATLAB implementation can be found at <github.com/TimothyHyndman/deconvolve-supp> which may work better in some circumstances.
If you supply your own bandwidth, then you should ensure that the kernel used here matches the one you used to calculate your bandwidth.
The DKDE can also be computed using the Fast Fourier Transform, which is a bit more complex. See Delaigle and Gijbels (2007). However if the grid of t-values is fine enough, the estimator can simply be computed like here without having problems with oscillations.
Camirand, F., Carroll, R.J. and Delaigle, A. (2018). Estimating the distribution of episodically consumed food measured with errors. Delaigle, A. and Gijbels, I. (2007). Frequent problems in calculating integrals and optimizing objective functions: a case study in density deconvolution. Statistics and Computing, 17, 349-355.
Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 1, 231-252.
Delaigle, A., Hall, P. and Meister, A. (2008). On Deconvolution with repeated measurements. Annals of Statistics, 36, 665-685
Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli, 14, 2, 562-579.
Stefanski, L.A. and Carroll, R.J. (1990). Deconvolving kernel density estimators. Statistics, 21, 2, 169-184. Manuscript.
Aurore Delaigle, Timothy Hyndman, Tianying Wang
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ## Not run:
# Error estimated from replicates ----------------------------------------------
W1 <- (framingham$SBP21 + framingham$SBP22)/2
W2 <- (framingham$SBP31 + framingham$SBP32)/2
yy <- deconvolve(W1, W2)
# Symmetric Errors -------------------------------------------------------------
output <- deconvolve((framingham$SBP21 + framingham$SBP22)/2)
# Generate homoscedastic data --------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)
sd_U = 0.2
U <- stats::rnorm(n, sd = sd_U)
W <- X + U
# Homoscedastic Errors ---------------------------------------------------------
yy <- deconvolve(W, errortype = "norm", sd_U = sd_U)
# Generate heteroscedastic data ------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)
sd_U_vec <- 0.6 * sqrt(1 + (1:n) / n) * sqrt(0.5)
U <- c()
for (sigUk in sd_U_vec){
U <- c(U, stats::rnorm(1, 0, sigUk))
}
W <- X + U
# Heteroscedastic Errors -------------------------------------------------------
yy <- deconvolve(W, errortype = "norm", sd_U = sd_U_vec)
# Heteroscedastic Errors provided using a vector of phiUs ----------------------
phiU <- c()
for (sigUk in sd_U_vec){
phiUk <- function(tt) {
exp(-sigUk^2 * tt^2 / 2)
}
phiU <- c(phiU, phiUk)
}
yy <- deconvolve(W, sd_U = sd_U_vec, phiU = phiU)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.