Description Usage Arguments Details Value References Author(s) Examples
Computes a bandwidth for the deconvolution kernel estimator of the density of X from data W_{i1} = X_i + U_{i1}, i=1,...,n when the distribution of U_i is known, unknown, or estimated from replicates, W_{i1} = X_i + U_{i1} and W_{i2} = X_i + U_{i2}. If 'SIMEX' algorithm used, computes a bandwidth for use in deconvolution regression of data (W_i, Y_i) where Y_i = g(X_i) + V_i and W_i = X_i + U_i.
1 2 3 4 |
W1 |
A vector of size n containing the univariate contaminated data. |
W2 |
(optional) A vector of size n containing replicate measurements for the same
n individuals (in the same order) as W1. If supplied, then the error distribution
will be estimated using the replicates only if |
errortype |
A single string giving the distribution of U, either "laplace" or "normal".
If you define the error distribution this way then you must also provide
|
sd_U |
The standard deviations of U. A single value for
homoscedastic errors and a vector having the same length as |
phiU |
Function(s) giving the characteristic function of the errors. A
single real valued function for homoscedastic errors and a vector of n real valued functions
for heteroscedastic errors. If you define the errors this way then you
should not provide |
Y |
A vector of the univariate dependent data (used in the regression setting). Only required for 'SIMEX' algorithm. |
algorithm |
One of |
n_cores |
Number of cores to use when using SIMEX algorithm. If
|
kernel_type |
A string giving the kernel K to use when computing the estimator of the density of X. Either "default", "normal", or "sinc". The default kernel has characteristic function (1-t^2)^3 for t \in [-1,1]. The normal kernel is the standard normal density. The sinc kernel has characteristic function equal to 1 for t \in [-1,1] |
seed |
Set seed for SIMEX. Allows for reproducible results using SIMEX. |
use_alt_SIMEX_rep_opt |
Only used with SIMEX based on replicates. If
|
het_replicates |
If |
The function bandwidth
chooses from one of seven different methods
depending on how the error distribution is defined/computed and which
algorithm is selected.
PI for known homoscedastic error distribution: If algorithm = "PI"
and the error
distribution is defined by either a single function phiU
, or a single value
sd_U
along with its errortype
, then the method used is as
described in Delaigle and Gijbels (2002) and Delaigle and Gijbels (2004).
PI for known heteroscedastic error distributions: If algorithm = "PI"
and the
error distributions are defined by a either a vector of functions phiU
, or a vector
sd_U
along with its (unique) errortype
then the method used is as
described in Delaigle and Meister (2008).
PI for unknown homoscedastic error distribution estimated from replicates:
If algorithm = "PI"
and a replicate vector W2
is supplied and the errors are
assumed to be homoscedastic, then the error distribution is estimated using replicates as
described in the main deconvolution code and then the PI bandwidth for known errors is used,
with the true error distribution replaced by its estimator.
PI for unknown heteroscedastic error distribution estimated from replicates:
If algorithm = "PI"
and a replicate vector W2
is supplied and the errors are
heteroscedastic, then the density of X is estimated as described in Delaigle and Meister (2008)
with adjustments described in deconvolve
and the PI bandwidth uses arguments similars to the
ones used for the PI bandwidth for known errors, described in Delaigle and Meister (2008),
is used, but adapated to the unknown error case.
PI for unknown homoscedastic error distribution estimated without replicates:
If algorithm = "PI"
and the errors are not supplied, then the error distribution
is estimated using the method described in Delaigle and Hall (2016) and then the bandwidth
is calculated using the method described in Delaigle and Gijbels (2002) and Delaigle and
Gijbels (2004) except that the error distribution is replaced by its estimator.
CV: If algorithm = "CV"
then the method used is the corss-validation described
in Stefanski and Carroll (1990) and Delaigle and Gijbels (2004).
SIMEX: If algorithm = "SIMEX"
then the method used is the SIMEX procedure
described in Delaigle and Hall (2008).
SIMEX for Replicates: If algorithm = "SIMEX"
and a
replicate vector W2
is supplied, then phi_U is calculated using
replicates and SIMEX is performed as according to
use_alt_SIMEX_rep_opt
.
The order in which we choose the methods is as follows:
If provided, use phiU
to define the errors, otherwise
If provided use errortype
and sd_u
to define the errors, otherwise
If provided, use the vector of replicates W2
to estimate the error distribution, otherwise
We use the method for unknown homoscedastic error distribution estimated without replicates.
Note that in both 1 and 2, if a vector of replicates W2
is provided we
augment the data in W1
with that in W2
.
A data-driven bandwidth. If using 'SIMEX' algorithm then returns a list containing the bandwidth 'h' and ridge parameter 'rho'.
Delaigle, A. and Gijbels, I. (2002). Estimation of integrated squared density derivatives from a contaminated sample. Journal of the Royal Statistical Society, B, 64, 4, 869-886.
Delaigle, A. and Gijbels, I. (2004). Practical bandwidth selection in deconvolution kernel density estimation. Computational Statistics and Data Analysis, 45, 2, 249 - 267.
Delaigle, A. and Hall, P. (2008). Using SIMEX for smoothing-parameter choice in errors-in-variables problems. Journal of the American Statistical Association, 103, 481, 280-287
Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 1, 231-252.
Delaigle, A., Hall, P., and Meister, A. (2008). On Deconvolution with repeated measurements. Annals of Statistics, 36, 665-685
Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli, 14, 2, 562-579.
Stefanski, L. and Carroll, R.J. (1990). Deconvoluting kernel density estimators. Statistics, 21, 2, 169-184.
Aurore Delaigle, Timothy Hyndman, Tianying Wang
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ## Not run:
# PI bandwidth with error estimated from replicates ----------------------------
W1 <- (framingham$SBP21 + framingham$SBP22)/2
W2 <- (framingham$SBP31 + framingham$SBP32)/2
bw <- bandwidth(W1, W2)
# Generate homoscedastic data --------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)
sd_U = 0.2
U <- stats::rnorm(n, sd = sd_U)
W <- X + U
# CV bandwidth -----------------------------------------------------------------
bw <- bandwidth(W, errortype = "norm", sd_U = sd_U, algorithm = "CV")
# SIMEX bandwidth --------------------------------------------------------------
Y <- 2*X
output <- bandwidth(W, errortype = "norm", sd_U = sd_U, Y = Y,
algorithm = "SIMEX", n_cores = 2)
bw <- output$h
rho <- output$rho
# Generate heteroscedastic data ------------------------------------------------
n <- 50
X <- stats::rchisq(n, 3)
sd_U_vec <- 0.6 * sqrt(1 + (1:n) / n) * sqrt(0.5)
U <- c()
for (sigUk in sd_U_vec){
U <- c(U, stats::rnorm(1, 0, sigUk))
}
W <- X + U
# PI bandwidth with heteroscedastic errors -------------------------------------
bw <- bandwidth(W, errortype = "norm", sd_U = sd_U_vec)
# PI bandwidth with heteroscedastic errors supplied using phiU -----------------
phiU <- c()
for (sigUk in sd_U_vec){
phiUk <- function(tt) {
exp(-sigUk^2 * tt^2 / 2)
}
phiU <- c(phiU, phiUk)
}
bw <- bandwidth(W, sd_U = sd_U_vec, phiU = phiU)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.