Estimate a Confidence Interval for a Single Quantile of a Parent Distribution by a Simple Algorithm

Description

This function estimates the lower and upper limits of a specified confidence interval for an aribitrary quantile value of a specified parent distribution [quantile function Q(F,θ) with parameters θ] using Monte Carlo simulation. The quantile value, actually the nonexceedance probability (F for 0 ≤ F ≤ 1) of the value, is specified by the user. The user also provides the parameters of the parent distribution (see lmom2par). This function does consider an estimate of the variance-covariance structure of the sample data (for that see qua2ci.cov). The qua2ci.simple is the original implementation and dates close to the initial releases of lmomco and was originally named qua2ci. That name is now deprecated but retained as an alias, which will be removed at some later release.

For nsim simulation runs (ideally a large number), samples of size n are drawn from Q(F,θ). The L-moments of each simulated sample are computed using lmoms and a distribution of the same type is fit. The F-quantile of the just-fitted distribution is computed and placed into a vector. The process of simulating the sample, computing the L-moments, computing the parameters, and solving for the F-quantile is repeated for the specified number of simulation runs.

To estimate the confidence interval, the L-moments of the vector simulated quantiles are computed. Subsequently, the parameters of a user-specified distribution “error” distribution (edist) are computed. The two quantiles of this error distribution for the specified confidence interval are computed. These two quantiles represent the estimated lower and upper limits for the confidence interval of the parent distribution for samples of size n. The error distribution defaults to the Generalized Normal (see pargno) because this distribution has the Normal as a special case but extends the fit to the 3rd L-moment (τ_3) for exotic situations in which some asymmetry in the quantile distribution might exist.

Finally, it is often useful to have vectors of lower and upper limits for confidence intervals for a vector of F values. The function genci.simple does just that and uses qua2ci.simple as the computational underpinning.

Usage

1
2
qua2ci.simple(f,para,n, level=0.90, edist="gno", nsim=1000, showpar=FALSE,
                        empdist=TRUE, verbose=FALSE, maxlogdiff=6, ...)

Arguments

f

Nonexceedance probability (0 ≤ F ≤ 1) of the quantile for which the confidence interval is needed. This function is not vectorized and therefore only the first value will be used. This is in contrast to the vectorization of F in the conceptually similar function qua2ci.cov.

para

The parameters from lmom2par or vec2par—these parameters represent the “true” parent.

n

The sample size for each Monte Carlo simulation will use.

level

The confidence interval (0 ≤ level < 1). The interval is specified as the size of the interval. The default is 0.90 or the 90th percentile. The function will return the 5th [(1-0.90)/2] and 95th [(1-(1-0.90)/2)] percentile cumulative probability of the simulated quantile distribution as specified by the nonexceedance probability argument. The arguments level and f therefore are separate features.

edist

The model for the error distribution. Although the Normal (the default) commonly is assumed in error analyses, it need not be, as support for other distributions supported by lmomco is available. The default is the Generalized Normal so the not only is the Normal possible but asymmetry is also accomodated (lmomgno). For example, if the L-skew (τ_4) or L-kurtosis (τ_4) values depart considerably from those of the Normal (τ_3 = 0 and τ_4 = 0.122602), then the Generalized Normal or some alternative distribution would likely provide more reliable confidence interval estimation.

nsim

The number of simulations (replications) for the sample size n to perform. Large numbers produce more refined confidence limit estimates at the cost of CPU time. The default is anticipated to be large enough for evaluative-useage without too much computational delay. Larger simulation numbers are recommended.

showpar

The parameters of the edist for each simulation are printed.

empdist

If TRUE, then an R environment is appended onto the element empdist in the returned list, otherwise empdist is NA.

verbose

The verbosity of the operation of the function.

maxlogdiff

The maximum permitted difference in log10 space between a simulated quantile and the true value. It is possible that a well fit simulated sample to the parent distribution type provides crazy quantile estimates in the far reaches of either tail. The default value of 6 was chosen based on experience with the Kappa distribution fit to a typical heavy-right tail flood magnitude data set. The concern motivating this feature is that as the number of parameters increases, it seems progressively there is more chance for a distribution tail to swing wildy into regions for which an analyst would not be comfortable with given discipline-specific knowledge. The choice of 6-log cycles is ad hoc at best, and users are encouraged to do their own exploration. If verbose=TRUE then a message will be printed when the maxlogdiff condition is tripped.

...

Additional arguments to pass such as to lmom2par.

Value

An R list is returned. The lwr and upr match the nomenclature of qua2ci.cov but because qua2ci.simple is provided the parent, the true value is returned, whereas qua2ci.cov returns the fit.

lwr

The lower value of the confidence interval having nonexceedance probability equal to (1-level)/2.

true

The value returned by par2qua(f,para).

upr

The upper value of the confidence interval having nonexceedance probability equal to 1-(1-level)/2.

elmoms

The L-moments from lmoms of the distribution of simulated of quantiles.

epara

The parameters of the error distribution fit using the elmoms.

empdist

An R environment (see below).

ifail

A diagnostic value. A value of zero means that successful exit was made.

ifailtext

A descriptive message related to the ifail value.

nsim

An echoing of the nsim argument for the function.

sim.attempts

The number of executions of the while loop (see Note below).

The empdist element in the returned list is an R environment that contains:

simquas

A nsim-long vector of the simulated quantiles for f.

empir.dist.lwr

The lower limit derived from the R quantile function for type=6, which uses i/(n+1).

empir.dist.upr

The upper limit derived from the R quantile function for type=6, which uses i/(n+1).

bern.smooth.lwr

The lower limit estimated by the Bernstein smoother in dat2bernqua for
poly.type = "Bernstein" and bound.type = "none".

bern.smooth.upr

The upper limit estimated by the Bernstein smoother in dat2bernqua for
poly.type = "Bernstein" and bound.type = "none".

epmoms

The product moments of the simulated quantiles from pmoms.

Note

This function relies on a while loop that runs until nsim have successfully completed. Some reasons for an early next in the loop include invalid L-moments by are.lmom.valid of the simluated data or invalid fitted parameters by are.par.valid to simulated L-moments. See the source code for more details.

Author(s)

W.H. Asquith

See Also

lmoms, pmoms, par2qua, genci.simple, qua2ci.cov

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
## Not run: 
# It is well known that standard deviation (sigma) of the
# sample mean is equal to sigma/sample_size. Let is look at the
# quantile distribution of the median (f=0.5)
mean   <- 0; sigma <- 100
parent <- vec2par(c(mean,sigma), type='nor')
CI     <- qua2ci.simple(0.5, parent, n=10, nsim=20)
# Theoretrical sample mean sigma = 100/10 = 10
# L-moment theory: L-scale * sqrt(pi) = sigma
# Thus, it follows that the quantity
CI$elmoms$lambdas[2]/sqrt(pi)
# approaches 10 as nsim --> Inf.
## End(Not run)

# Another example.
D   <- c(123, 34, 4, 654, 37, 78, 93, 95, 120) # fake sample
lmr <- lmoms(D)    # compute the L-moments of the data
WEI <- parwei(lmr) # estimate Weibull distribution parameters
CI  <- qua2ci.simple(0.75,WEI,20, nsim=20, level=0.95)
# CI contains the estimate 95-percent confidence interval for the
# 75th-percentile of the parent Weibull distribution for 20 sample size 20.
## Not run: 
pdf("Substantial_qua2ci_example.pdf")
level <- 0.90; cilo <- (1-level)/2; cihi <- 1 - cilo
para <- lmom2par(vec2lmom(c(180,50,0.75)), type="gev")
A <- qua2ci.simple(0.98, para, 30, edist="gno", level=level, nsim=3000)
Apara <- A$epara; Aenv <- A$empdist
Bpara <- lmom2par(A$elmoms, type="aep4")

lo <- log10(A$lwr); hi <- log10(A$upr)
xs <- 10^(seq(lo-0.2, hi+0.2, by=0.005))
lo <- A$lwr; hi <- A$upr; xm <- A$true; sbar <- mean(Aenv$simquas)
dd <- density(Aenv$simquas, adjust=0.5)
pk <- max(dd$y, dlmomco(xs, Apara), dlmomco(xs, Bpara))
dx <- dd$x[dd$x >= Aenv$empir.dist.lower & dd$x <= Aenv$empir.dist.upper]
dy <- dd$y[dd$x >= Aenv$empir.dist.lower & dd$x <= Aenv$empir.dist.upper]
dx <- c(dx[1], dx, dx[length(dx)]); dy <- c(0, dy, 0)

plot(c(0), c(0), type="n", xlim=range(xs), ylim=c(0,pk),
                 xlab="X VALUE", ylab="PROBABILITY DENSITY")
polygon(dx, dy, col=8)
lines(xs, dlmomco(xs, Apara)); lines(xs, dlmomco(xs, Bpara), col=2, lwd=2)
lines(dd, lty=2, lwd=2, col=8)
lines(xs, dlmomco(xs, para), col=6); lines(c(xm,xm), c(0,pk), lty=4, lwd=3)
lines(c(lo,lo,NA,hi,hi), c(0,pk,NA,0,pk), lty=2)

xlo <- qlmomco(cilo, Apara); xhi <- qlmomco(cihi, Apara)
points(c(xlo, xhi), c(dlmomco(xlo, Apara), dlmomco(xhi, Apara)), pch=16)
xlo <- qlmomco(cilo, Bpara); xhi <- qlmomco(cihi, Bpara)
points(c(xlo, xhi), c(dlmomco(xlo, Bpara), dlmomco(xhi, Bpara)), pch=16, col=2)
lines(rep(Aenv$empir.dist.lwr, 2), c(0,pk), lty=3, lwd=2, col=3)
lines(rep(Aenv$empir.dist.upr, 2), c(0,pk), lty=3, lwd=2, col=3)
lines(rep(Aenv$bern.smooth.lwr,2), c(0,pk), lty=3, lwd=2, col=4)
lines(rep(Aenv$bern.smooth.upr,2), c(0,pk), lty=3, lwd=2, col=4)
cat(c(  "F(true) = ",             round(plmomco(xm,   Apara), digits=2),
      "; F(mean(sim), edist) = ", round(plmomco(sbar, Apara), digits=2), "\n"), sep="")
dev.off()
## End(Not run)
## Not run: 
ty <- "nor" # try running with "glo" (to get the L-skew "fit", see below)
para <- lmom2par(vec2lmom(c(-180,70,-.5)), type=ty)
f <- 0.99; n <- 41; ns <- 1000; Qtrue <- qlmomco(f, para)
Qsim1 <- replicate(ns, qlmomco(f, lmom2par(lmoms(rlmomco(n, para)), type=ty)))
Qsim2 <- qua2ci.simple(f, para, n, nsim=ns, edist="gno")
Qbar1 <- mean(Qsim1); Qbar2 <- mean(Qsim2$empdist$simquas)
epara <- Qsim2$epara; FT <- plmomco(Qtrue, epara)
F1 <- plmomco(Qbar1, epara); F2 <- plmomco(Qbar2, epara)
cat(c(  "F(true) = ",      round(FT, digits=2),
      "; F(via sim.) = ",  round(F1, digits=2),
      "; F(via edist) = ", round(F2, digits=2), "\n"), sep="")
# The given L-moments are highly skewed, but a Normal distribution is fit so
# L-skew is ignored. The game is deep tail (f=0.99) estimation. The true value of the
# quantile has a percentile on the error distribution 0.48 that is almost exactly 0.5
# (median = mean = symmetrical error distribution).  A test run shows nice behavior:
# F(true) =  0.48; F(via sim.) =  0.49; F(via edist) =  0.5
# But another run with ty <- "glo" (see how 0.36 << [0.52, 0.54]) has
# F(true) =  0.36; F(via sim.) =  0.54; F(via edist) =  0.52
# So as the asymmetry becomes extreme, the error distribution becomes asymmetrical too.
## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.