Estimating probability density function from data with measurement error

Share:

Description

To compute the probability density function from data contaminated with measurement error. The measurement errors can be either homoscedastic or heteroscedastic.

Usage

1
2
DeconPdf(y,sig,x,error='normal',bw='dboot1',adjust=1,fft=FALSE,
	n=512,from,to,cut=3,na.rm=FALSE,grid=100,ub=2, ...)

Arguments

y

The observed data. It is a vector of length at least 3.

sig

The standard deviations σ. If homoscedastic errors, sig is a single value. If heteroscedastic errors, sig is a vector of standard deviations having the same length as y.

x

x is user-defined grids where the PDF will be evaluated. FFT method is not applicable if x is given.

error

Error distribution types: (1) 'normal' for normal errors; (2) 'laplacian' for Laplacian errors; (3) 'snormal' for a special case of small normal errors.

bw

Specifies the bandwidth. It can be a single numeric value which has been pre-determined; or computed with the specific bandwidth selector: 'dnrd' to compute the rule-of-thumb plugin bandwidth as suggested by Fan (1991); 'dmise' to compute the plugin bandwidth by minimizing MISE; 'dboot1' to compute the bootstrap bandwidth selector without resampling (Delaigle and Gijbels, 2004a), which minimizing the MISE bootstrap bandwidth selectors; 'boot2' to compute the smoothed bootstrap bandwidth selector with resampling.

adjust

adjust the range there the PDF is to be evaluated. By default, adjust=1.

fft

To specify the method to compute the PDF. 'fft=FALSE' to compute directly; 'fft=TRUE' to compute the PDF by using the Fast Fourier Transformation.

n

number of points where the PDF is to be evaluated.

from

the starting point where the PDF is to be evaluated.

to

the starting point where the PDF is to be evaluated.

cut

used to adjust the starting end ending points where the PDF is to be evaluated.

na.rm

is set to FALSE by default: no NA value is allowed.

grid

the grid number to search the optimal bandwidth when a bandwidth selector was specified in bw. Default value "grid=100".

ub

the upper boundary to search the optimal bandwidth, default value is "ub=2".

...

control

Details

If the number of points to be evaluated is too small (less than 32), a direct computing method is preferred. The current version can support up to 2^21 points where the PDF to be computed.

Value

An object of class “Decon”.

Author(s)

X.F. Wang wangx6@ccf.org

B. Wang bwang@jaguar1.usouthal.edu

References

Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli, 14, 562-579.

Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics, 19, 1257-1272.

Fan, J. (1992). Deconvolution with supersmooth distributions. The Canadian Journal of Statistics, 20, 155-169.

Wang, X.F. and Wang, B. (2011). Deconvolution estimation in measurement error models: The R package decon. Journal of Statistical Software, 39(10), 1-24.

See Also

DeconCdf, DeconNpr, DeconCPdf.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
## Deconvolution: the case of homoscedastic errors
## Case 1.1: homoscedastic Laplacian errors
n1 <- 500
x1 <- rnorm(n1, sd=1)
sig1 <- .5
u1 <- ifelse(runif(n1) > 0.5, 1, -1) * rexp(n1,rate=1/sig1)
w1 <- x1+u1
## The rule-of-thumb method may not be accurate, 
## you may try the bootstrap method
bw1 <- bw.dnrd(w1,sig=sig1, error="laplacian")
(f1 <-  DeconPdf(w1,sig1,error='laplacian',bw=bw1, fft=TRUE))

## Case 1.2: homoscedastic normal errors
n2 <- 1000
x2 <- c(rnorm(n2/2,-3,1),rnorm(n2/2,3,1))
sig2 <- .8
u2 <- rnorm(n2, sd=sig2)
w2 <- x2+u2
# estimate the bandwidth with the bootstrap method with resampling
bw2 <- bw.dboot2(w2,sig=sig2, error="normal")
# estimate the unknown density with measurement error
(f2 <-  DeconPdf(w2,sig2,error='normal',bw=bw2, fft=TRUE))

# plot the results
par(mfrow=c(1,2))
plot(f1,  col="red", lwd=3, lty=2, xlab="x", ylab="f(x)", main="")
lines(density(x1, from=min(w1), to=max(w1)), lwd=3, lty=1)
lines(density(w1), col="blue", lwd=3, lty=3)
plot(f2,  col="red", lwd=3, lty=2, xlab="x", ylab="f(x)", main="")
lines(density(x2, from=min(w2), to=max(w2)), lwd=3, lty=1)
lines(density(w2), col="blue", lwd=3, lty=3)


## Deconvolution: the case of heteroscedastic errors
## Case 2: heteroscedastic normal errors
n3 <- 2000
x3 <- rchisq(n3, df=1.5, ncp=0)
sig3 <- 0.7+ x3/max(x3)
u3 <- sapply(sig3, function(x) rnorm(1, sd=x))
w3 <- x3+u3
# estimate the bandwidth using the bootstrap method withou resampling
bw3 <- bw.dboot1(w3,sig=sig3, error="normal")
# estimate the unknown density with measurement error
(f3 <-  DeconPdf(w3,sig3,error="normal", bw=bw3, fft=TRUE))

# plot the results
par(mfrow=c(1,1))
plot(f3,  col="red", lwd=3, lty=2, ylim=c(0,0.4), xlab="x", ylab="f(x)", main="")
lines(density(x3, adjust=2), lwd=3, lty=1)
lines(density(w3, adjust=2), col="blue", lwd=3, lty=3)