Estimate the mean and coefficient of variation of a lognormal distribution given a sample of data that has been subjected to Type I censoring, and optionally construct a confidence interval for the mean.
1 2 3  elnormAltCensored(x, censored, method = "mle", censoring.side = "left",
ci = FALSE, ci.method = "profile.likelihood", ci.type = "twosided",
conf.level = 0.95, n.bootstraps = 1000, pivot.statistic = "z", ...)

x 
numeric vector of observations. Missing ( 
censored 
numeric or logical vector indicating which values of 
method 
character string specifying the method of estimation. For singly censored data, the possible values are: For multiply censored data, the possible values are: See the DETAILS section for more information. 
censoring.side 
character string indicating on which side the censoring occurs. The possible
values are 
ci 
logical scalar indicating whether to compute a confidence interval for the
mean or variance. The default value is 
ci.method 
character string indicating what method to use to construct the confidence interval
for the mean. The possible values are
The confidence interval method See the DETAILS section for more information.
This argument is ignored if 
ci.type 
character string indicating what kind of confidence interval to compute. The
possible values are 
conf.level 
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is 
n.bootstraps 
numeric scalar indicating how many bootstraps to use to construct the
confidence interval for the mean when 
pivot.statistic 
character string indicating which pivot statistic to use in the construction
of the confidence interval for the mean when 
... 
additional arguments to pass to other functions.

If x
or censored
contain any missing (NA
), undefined (NaN
) or
infinite (Inf
, Inf
) values, they will be removed prior to
performing the estimation.
Let \underline{x} be a vector of n observations from a
lognormal distribution with
parameters mean=
θ and cv=
τ. Let η denote the
standard deviation of this distribution, so that η = θ τ. Set
\underline{y} = log(\underline{x}). Then \underline{y} is a vector of observations
from a normal distribution with parameters mean=
μ and sd=
σ.
See the help file for LognormalAlt for the relationship between
θ, τ, η, μ, and σ.
Let \underline{x} denote a vector of N observations from a
lognormal distribution with parameters
mean=
θ and cv=
τ. Let η denote the
standard deviation of this distribution, so that η = θ τ. Set
\underline{y} = log(\underline{x}). Then \underline{y} is a
vector of observations from a normal distribution with parameters
mean=
μ and sd=
σ. See the help file for
LognormalAlt for the relationship between
θ, τ, η, μ, and σ.
Assume n (0 < n < N) of the N observations are known and c (c=Nn) of the observations are all censored below (leftcensored) or all censored above (rightcensored) at k fixed censoring levels
T_1, T_2, …, T_k; \; k ≥ 1 \;\;\;\;\;\; (1)
For the case when k ≥ 2, the data are said to be Type I multiply censored. For the case when k=1, set T = T_1. If the data are leftcensored and all n known observations are greater than or equal to T, or if the data are rightcensored and all n known observations are less than or equal to T, then the data are said to be Type I singly censored (Nelson, 1982, p.7), otherwise they are considered to be Type I multiply censored.
Let c_j denote the number of observations censored below or above censoring level T_j for j = 1, 2, …, k, so that
∑_{i=1}^k c_j = c \;\;\;\;\;\; (2)
Let x_{(1)}, x_{(2)}, …, x_{(N)} denote the “ordered” observations, where now “observation” means either the actual observation (for uncensored observations) or the censoring level (for censored observations). For rightcensored data, if a censored observation has the same value as an uncensored one, the uncensored observation should be placed first. For leftcensored data, if a censored observation has the same value as an uncensored one, the censored observation should be placed first.
Note that in this case the quantity x_{(i)} does not necessarily represent the i'th “largest” observation from the (unknown) complete sample.
Finally, let Ω (omega) denote the set of n subscripts in the
“ordered” sample that correspond to uncensored observations.
ESTIMATION
This section explains how each of the estimators of mean=
θ and
cv=
τ are computed. The approach is to first compute estimates of
θ and η^2 (the mean and variance of the lognormal distribution),
say \hat{θ} and \hat{η}^2, then compute the estimate of the cv
τ by \hat{τ} = \hat{η}/\hat{θ}.
Maximum Likelihood Estimation (method="mle"
)
The maximum likelihood estimators of θ, τ, and η are
computed as:
\hat{θ}_{mle} = exp(\hat{μ}_{mle} + \frac{\hat{σ}^2_{mle}}{2}) \;\;\;\;\;\; (3)
\hat{τ}_{mle} = [exp(\hat{σ}^2_{mle})  1]^{1/2} \;\;\;\;\;\; (4)
\hat{η}_{mle} = \hat{θ}_{mle} \; \hat{τ}_{mle} \;\;\;\;\;\; (5)
where \hat{μ}_{mle} and \hat{σ}_{mle} denote the maximum
likelihood estimators of μ and σ. See the help for for
enormCensored
for information on how \hat{μ}_{mle} and
\hat{σ}_{mle} are computed.
Quasi Minimum Variance Unbiased Estimation Based on the MLE's (method="qmvue"
)
The maximum likelihood estimators of θ and η^2 are biased.
Even for complete (uncensored) samples these estimators are biased
(see equation (12) in the help file for elnormAlt
).
The bias tends to 0 as the sample size increases, but it can be considerable for
small sample sizes.
(Cohn et al., 1989, demonstrate the bias for complete data sets.)
For the case of complete samples, the minimum variance unbiased estimators (mvue's)
of θ and η^2 were derived by Finney (1941) and are discussed in
Gilbert (1987, pp.164167) and Cohn et al. (1989). These estimators are computed as:
\hat{θ}_{mvue} = e^{\bar{y}} g_{n1}(\frac{s^2}{2}) \;\;\;\;\;\; (6)
\hat{η}^2_{mvue} = e^{2 \bar{y}} \{g_{n1}(2s^2)  g_{n1}[\frac{(n2)s^2}{n1}]\} \;\;\;\;\;\; (7)
where
\bar{y} = \frac{1}{n} ∑_{i=1}^n y_i \;\;\;\;\;\; (8)
s^2 = \frac{1}{n1} ∑_{i=1}^n (y_i  \bar{y})^2 \;\;\;\;\;\; (9)
g_m(z) = ∑_{i=0}^∞ \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\;\;\; (10)
(see the help file for elnormAlt
).
For Type I censored samples, the quasi minimum variance unbiased estimators
(qmvue's) of θ and η^2 are computed using equations (6) and (7)
and estimating μ and σ with their mle's (see
elnormCensored
).
For singly censored data, this is apparently the LM method of Gilliom and Helsel
(1986, p.137) (it is not clear from their description on page 137 whether their
LM method is the straight method="mle"
described above or
method="qmvue"
described here). This method was also used by
Newman et al. (1989, p.915, equations 1011).
For multiply censored data, this is apparently the MM method of Helsel and Cohn
(1988, p.1998). (It is not clear from their description on page 1998 and the
description in Gilliom and Helsel, 1986, page 137 whether Helsel and Cohn's (1988)
MM method is the straight method="mle"
described above or method="qmvue"
described here.)
BiasCorrected Maximum Likelihood Estimation (method="bcmle"
)
This method was derived by ElShaarawi (1989) and can be applied to complete or
censored data sets. For complete data, the exact relative bias of the mle of
the mean θ is given as:
B_{mle} = \frac{E[\hat{θ}_{mle}]}{θ} = exp[\frac{(n1)σ^2}{2n}] (1  \frac{σ^2}{n})^{(n1)/2} \;\;\;\;\;\; (11)
(see equation (12) in the help file for elnormAlt
).
For the case of complete or censored data, ElShaarawi (1989) proposed the following “biascorrected” maximum likelihood estimator:
\hat{θ}_{bcmle} = \frac{\hat{θ}_{mle}}{\hat{B}_{mle}} \;\;\;\;\;\; (12)
where
\hat{B}_{mle} = exp[\frac{1}{2}(\hat{V}_{11} + 2\hat{σ}_{mle} \hat{V}_{12} + \hat{σ}^2_{mle} \hat{V}_{22})] \;\;\;\;\;\; (13)
and V denotes the asymptotic variancecovariance of the mle's of μ
and σ, which is based on the observed information matrix, formulas for
which are given in Cohen (1991). ElShaarawi (1989) does not propose a
biascorrected estimator of the variance η^2, so the mle of η
is computed when method="bcmle"
.
Robust Regression on Order Statistics (method="rROS"
) or
Imputation Using QuantileQuantile Regression (method ="impute.w.qq.reg"
)
This is the robust Regression on Order Statistics (rROS) method discussed in USEPA (2009)
and Helsel (2012). This method involves using quantilequantile regression on the
logtransformed observations to fit a regression line (and thus initially estimate the mean
μ and standard deviation σ in logspace), imputing the
logtransformed values of the c censored observations by predicting them
from the regression equation, transforming the logscale imputed values back to
the original scale, and then computing the method of moments estimates of the
mean and standard deviation based on the observed and imputed values.
The steps are:
Estimate μ and σ by computing the leastsquares estimates in the following model:
y_{(i)} = μ + σ Φ^{1}(p_i) + ε_i, \; i \in Ω \;\;\;\;\;\; (14)
where p_i denotes the plotting position associated with the i'th
largest value, a is a constant such that 0 ≤ a ≤ 1
(the default value is 0.375), Φ denotes the cumulative
distribution function (cdf) of the standard normal distribution and
Ω denotes the set of n subscripts associated with the
uncensored observations in the ordered sample. The plotting positions are
computed by calling the function ppointsCensored
.
Compute the logscale imputed values as:
\hat{y}_{(i)} = \hat{μ}_{qqreg} + \hat{σ}_{qqreg} Φ^{1}(p_i), \; i \not \in Ω \;\;\;\;\;\; (15)
Retransform the logscale imputed values:
\hat{x}_{(i)} = exp[\hat{y}_{(i)}], \; i \not \in Ω \;\;\;\;\;\; (16)
Compute the usual method of moments estimates of the mean and variance.
\hat{θ} = \frac{1}{N} [∑_{i \not \in Ω} \hat{x}_{(i)} + ∑_{i \in Ω} x_{(i)}] \;\;\;\;\;\; (17)
\hat{η}^2 = \frac{1}{N1} [∑_{i \not \in Ω} (\hat{x}_{(i)}  \hat{θ}^2) + ∑_{i \in Ω} (x_{(i)}  \hat{θ}^2)] \;\;\;\;\;\; (18)
Note that the estimate of variance is actually the usual unbiased one (not the method of moments one) in the case of complete data.
For sinlgy censored data, this method is discussed by Hashimoto and Trussell (1983), Gilliom and Helsel (1986), and ElShaarawi (1989), and is referred to as the LR (LogRegression) or LogProbability Method.
For multiply censored data, this is the MR method of Helsel and Cohn (1988, p.1998).
They used it with the probability method of Hirsch and Stedinger (1987) and
Weibull plotting positions (i.e., prob.method="hirschstedinger"
and
plot.pos.con=0
).
The argument plot.pos.con
(see the entry for ... in the ARGUMENTS
section above) determines the value of the plotting positions computed in
equations (14) and (15) when method
equals "hirschstedinger"
or
"michaelschucany"
. The default value is plot.pos.con=0.375
.
See the help file for ppointsCensored
for more information.
The arguments lb.impute
and ub.impute
(see the entry for ... in
the ARGUMENTS section above) determine the lower and upper bounds for the
imputed values. Imputed values smaller than lb.impute
are set to this
value. Imputed values larger than ub.impute
are set to this value.
The default values are lb.impute=0
and ub.impute=Inf
.
Imputation Using QuantileQuantile Regression Including the Censoring Level
(method ="impute.w.qq.reg.w.cen.level"
)
This method is only available for sinlgy censored data. This method was
proposed by ElShaarawi (1989), which he denoted as the Modified LR Method.
It is exactly the same method as imputation
using quantilequantile regression (method="impute.w.qq.reg"
), except that
the quantilequantile regression includes the censoring level. For left singly
censored data, the modification involves adding the point
[Φ^{1}(p_c), T] to the plot before fitting the leastsquares line.
For right singly censored data, the point [Φ^{1}(p_{n+1}), T]
is added to the plot before fitting the leastsquares line.
Imputation Using Maximum Likelihood (method ="impute.w.mle"
)
This method is only available for sinlgy censored data.
This is exactly the same method as robust Regression on Order Statistics (i.e.,
the same as using method="rROS"
or
method="impute.w.qq.reg"
),
except that the maximum likelihood method (method="mle"
) is used to compute
the initial estimates of the mean and standard deviation.
In the context of lognormal data, this method is discussed
by ElShaarawi (1989), which he denotes as the Modified Maximum Likelihood Method.
Setting Censored Observations to Half the Censoring Level (method="half.cen.level"
)
This method is applicable only to left censored data that is bounded below by 0.
This method involves simply replacing all the censored observations with half their
detection limit, and then computing the usual moment estimators of the mean and
variance. That is, all censored observations are imputed to be half the detection
limit, and then Equations (17) and (18) are used to estimate the mean and varaince.
This method is included only to allow comparison of this method to other methods.
Setting leftcensored observations to half the censoring level is not
recommended. In particular, ElShaarawi and Esterby (1992) show that these
estimators are biased and inconsistent (i.e., the bias remains even as the sample
size increases).
CONFIDENCE INTERVALS
This section explains how confidence intervals for the mean θ are
computed.
f
Likelihood Profile (ci.method="profile.likelihood"
)
This method was proposed by Cox (1970, p.88), and Venzon and Moolgavkar (1988)
introduced an efficient method of computation. This method is also discussed by
Stryhn and Christensen (2003) and Royston (2007).
The idea behind this method is to invert the likelihoodratio test to obtain a
confidence interval for the mean θ while treating the coefficient of
variation τ as a nuisance parameter.
For Type I left censored data, the likelihood function is given by:
L(θ, τ  \underline{x}) = {N \choose c_1 c_2 … c_k n} ∏_{j=1}^k [F(T_j)]^{c_j} ∏_{i \in Ω} f[x_{(i)}] \;\;\;\;\;\; (19)
where f and F denote the probability density function (pdf) and cumulative distribution function (cdf) of the population. That is,
f(t) = φ(\frac{tμ}{σ}) \;\;\;\;\;\; (20)
F(t) = Φ(\frac{tμ}{σ}) \;\;\;\;\;\; (21)
where
μ = log(\frac{θ}{√{τ^2 + 1}}) \;\;\;\;\;\; (22)
σ = [log(τ^2 + 1)]^{1/2} \;\;\;\;\;\; (23)
and φ and Φ denote the pdf and cdf of the standard normal distribution, respectively (Cohen, 1963; 1991, pp.6, 50). For left singly censored data, equation (3) simplifies to:
L(μ, σ  \underline{x}) = {N \choose c} [F(T)]^{c} ∏_{i = c+1}^n f[x_{(i)}] \;\;\;\;\;\; (24)
Similarly, for Type I right censored data, the likelihood function is given by:
L(μ, σ  \underline{x}) = {N \choose c_1 c_2 … c_k n} ∏_{j=1}^k [1  F(T_j)]^{c_j} ∏_{i \in Ω} f[x_{(i)}] \;\;\;\;\;\; (25)
and for right singly censored data this simplifies to:
L(μ, σ  \underline{x}) = {N \choose c} [1  F(T)]^{c} ∏_{i = 1}^n f[x_{(i)}] \;\;\;\;\;\; (26)
Following Stryhn and Christensen (2003), denote the maximum likelihood estimates of the mean and coefficient of variation by (θ^*, τ^*). The likelihood ratio test statistic (G^2) of the hypothesis H_0: θ = θ_0 (where θ_0 is a fixed value) equals the drop in 2 log(L) between the “full” model and the reduced model with θ fixed at θ_0, i.e.,
G^2 = 2 \{log[L(θ^*, τ^*)]  log[L(θ_0, τ_0^*)]\} \;\;\;\;\;\; (27)
where τ_0^* is the maximum likelihood estimate of τ for the reduced model (i.e., when θ = θ_0). Under the null hypothesis, the test statistic G^2 follows a chisquared distribution with 1 degree of freedom.
Alternatively, we may express the test statistic in terms of the profile likelihood function L_1 for the mean θ, which is obtained from the usual likelihood function by maximizing over the parameter τ, i.e.,
L_1(θ) = max_{τ} L(θ, τ) \;\;\;\;\;\; (28)
Then we have
G^2 = 2 \{log[L_1(θ^*)]  log[L_1(θ_0)]\} \;\;\;\;\;\; (29)
A twosided (1α)100\% confidence interval for the mean θ consists of all values of θ_0 for which the test is not significant at level alpha:
θ_0: G^2 ≤ χ^2_{1, {1α}} \;\;\;\;\;\; (30)
where χ^2_{ν, p} denotes the p'th quantile of the
chisquared distribution with ν degrees of freedom.
Onesided lower and onesided upper confidence intervals are computed in a similar
fashion, except that the quantity 1α in Equation (30) is replaced with
12α.
Direct Normal Approximations (ci.method="delta"
or ci.method="normal.approx"
)
An approximate (1α)100\% confidence interval for θ can be
constructed assuming the distribution of the estimator of θ is
approximately normally distributed. That is, a twosided (1α)100\%
confidence interval for θ is constructed as:
[\hat{θ}  t_{1α/2, m1}\hat{σ}_{\hat{θ}}, \; \hat{θ} + t_{1α/2, m1}\hat{σ}_{\hat{θ}}] \;\;\;\;\;\; (31)
where \hat{θ} denotes the estimate of θ, \hat{σ}_{\hat{θ}} denotes the estimated asymptotic standard deviation of the estimator of θ, m denotes the assumed sample size for the confidence interval, and t_{p,ν} denotes the p'th quantile of Student's tdistribuiton with ν degrees of freedom. Onesided confidence intervals are computed in a similar fashion.
The argument ci.sample.size
determines the value of m (see
see the entry for ... in the ARGUMENTS section above).
When method
equals "mle"
, "qmvue"
, or "bcmle"
and the data are singly censored, the default value is the
expected number of uncensored observations, otherwise it is n,
the observed number of uncensored observations. This is simply an adhoc
method of constructing confidence intervals and is not based on any
published theoretical results.
When pivot.statistic="z"
, the p'th quantile from the
standard normal distribution is used in place of the
p'th quantile from Student's tdistribution.
Direct Normal Approximation Based on the Delta Method (ci.method="delta"
)
This method is usually applied with the maximum likelihood estimators
(method="mle"
). It should also work approximately for the quasi minimum
variance unbiased estimators (method="qmvue"
) and the biascorrected maximum
likelihood estimators (method="bcmle"
).
When method="mle"
, the variance of the mle of θ can be estimated
based on the variancecovariance matrix of the mle's of μ and σ
(denoted V), and the delta method:
\hat{σ}^2_{\hat{θ}} = (\frac{\partial θ}{\partial \underline{λ}})^{'}_{\hat{\underline{λ}}} \hat{V} (\frac{\partial θ}{\partial \underline{λ}})_{\hat{\underline{λ}}} \;\;\;\;\;\; (32)
where
\underline{λ}' = (μ, σ) \;\;\;\;\;\; (33)
\frac{\partial θ}{\partial μ} = exp(μ + \frac{σ^2}{2}) \;\;\;\;\;\; (34)
\frac{\partial θ}{\partial σ} = σ exp(μ + \frac{σ^2}{2}) \;\;\;\;\;\; (35)
(Shumway et al., 1989). The variancecovariance matrix V of the mle's of
μ and σ is estimated based on the inverse of the observed Fisher
Information matrix, formulas for which are given in Cohen (1991).
Direct Normal Approximation Based on the Moment Estimators (ci.method="normal.approx"
)
This method is valid only for the moment estimators based on imputed values
(i.e., method="impute.w.qq.reg"
or method="half.cen.level"
). For
these cases, the standard deviation of the estimated mean is assumed to be
approximated by
\hat{σ}_{\hat{θ}} = \frac{\hat{η}}{√{m}} \;\;\;\;\;\; (36)
where, as already noted, m denotes the assumed sample size.
This is simply an adhoc method of constructing confidence intervals and is not
based on any published theoretical results.
Cox's Method (ci.method="cox"
)
This method may be applied with the maximum likelihood estimators
(method="mle"
), the quasi minimum variance unbiased estimators
(method="qmvue"
), and the biascorrected maximum likelihood estimators
(method="bcmle"
).
This method was proposed by ElShaarawi (1989) and is an extension of the
method derived by Cox and presented in Land (1972) for the case of
complete data (see the explanation of ci.method="cox"
in the help file
for elnormAlt
). The idea is to construct an approximate
(1α)100\% confidence interval for the quantity
β = exp(μ + \frac{σ^2}{2}) \;\;\;\;\;\; (37)
assuming the estimate of β
\hat{β} = exp(\hat{μ} + \frac{\hat{σ}^2}{2}) \;\;\;\;\;\; (38)
is approximately normally distributed, and then exponentiate the confidence limits. That is, a twosided (1α)100\% confidence interval for θ is constructed as:
[ exp(\hat{β}  h), \; exp(\hat{β} + h) ]\;\;\;\;\;\; (39)
where
h = t_{1α/2, m1}\hat{σ}_{\hat{β}} \;\;\;\;\;\; (40)
and \hat{σ}_{\hat{β}} denotes the estimated asymptotic standard deviation of the estimator of β, m denotes the assumed sample size for the confidence interval, and t_{p,ν} denotes the p'th quantile of Student's tdistribuiton with ν degrees of freedom.
ElShaarawi (1989) shows that the standard deviation of the mle of β can be estimated by:
\hat{σ}_{\hat{β}} = √{ \hat{V}_{11} + 2 \hat{σ} \hat{V}_{12} + \hat{σ}^2 \hat{V}_{22} } \;\;\;\;\;\; (41)
where V denotes the variancecovariance matrix of the mle's of μ and σ and is estimated based on the inverse of the Fisher Information matrix.
Onesided confidence intervals are computed in a similar fashion.
Bootstrap and BiasCorrected Bootstrap Approximation (ci.method="bootstrap"
)
The bootstrap is a nonparametric method of estimating the distribution
(and associated distribution parameters and quantiles) of a sample statistic,
regardless of the distribution of the population from which the sample was drawn.
The bootstrap was introduced by Efron (1979) and a general reference is
Efron and Tibshirani (1993).
In the context of deriving an approximate (1α)100\% confidence interval for the population mean θ, the bootstrap can be broken down into the following steps:
Create a bootstrap sample by taking a random sample of size N from the observations in \underline{x}, where sampling is done with replacement. Note that because sampling is done with replacement, the same element of \underline{x} can appear more than once in the bootstrap sample. Thus, the bootstrap sample will usually not look exactly like the original sample (e.g., the number of censored observations in the bootstrap sample will often differ from the number of censored observations in the original sample).
Estimate θ based on the bootstrap sample created in Step 1, using the same method that was used to estimate θ using the original observations in \underline{x}. Because the bootstrap sample usually does not match the original sample, the estimate of θ based on the bootstrap sample will usually differ from the original estimate based on \underline{x}.
Repeat Steps 1 and 2 B times, where B is some large number.
The number of bootstraps B is determined by the argument
n.bootstraps
(see the section ARGUMENTS above).
The default value of n.bootstraps
is 1000
.
Use the B estimated values of θ to compute the empirical
cumulative distribution function of this estimator of θ (see
ecdfPlot
), and then create a confidence interval for θ
based on this estimated cdf.
The twosided percentile interval (Efron and Tibshirani, 1993, p.170) is computed as:
[\hat{G}^{1}(\frac{α}{2}), \; \hat{G}^{1}(1\frac{α}{2})] \;\;\;\;\;\; (42)
where \hat{G}(t) denotes the empirical cdf evaluated at t and thus \hat{G}^{1}(p) denotes the p'th empirical quantile, that is, the p'th quantile associated with the empirical cdf. Similarly, a onesided lower confidence interval is computed as:
[\hat{G}^{1}(α), \; ∞] \;\;\;\;\;\; (43)
and a onesided upper confidence interval is computed as:
[0, \; \hat{G}^{1}(1α)] \;\;\;\;\;\; (44)
The function elnormAltCensored
calls the R function quantile
to compute the empirical quantiles used in Equations (42)(44).
The percentile method bootstrap confidence interval is only firstorder accurate (Efron and Tibshirani, 1993, pp.187188), meaning that the probability that the confidence interval will contain the true value of θ can be off by k/√{N}, where kis some constant. Efron and Tibshirani (1993, pp.184188) proposed a biascorrected and accelerated interval that is secondorder accurate, meaning that the probability that the confidence interval will contain the true value of θ may be off by k/N instead of k/√{N}. The twosided biascorrected and accelerated confidence interval is computed as:
[\hat{G}^{1}(α_1), \; \hat{G}^{1}(α_2)] \;\;\;\;\;\; (45)
where
α_1 = Φ[\hat{z}_0 + \frac{\hat{z}_0 + z_{α/2}}{1  \hat{a}(z_0 + z_{α/2})}] \;\;\;\;\;\; (46)
α_2 = Φ[\hat{z}_0 + \frac{\hat{z}_0 + z_{1α/2}}{1  \hat{a}(z_0 + z_{1α/2})}] \;\;\;\;\;\; (47)
\hat{z}_0 = Φ^{1}[\hat{G}(\hat{θ})] \;\;\;\;\;\; (48)
\hat{a} = \frac{∑_{i=1}^N (\hat{θ}_{(\cdot)}  \hat{θ}_{(i)})^3}{6[∑_{i=1}^N (\hat{θ}_{(\cdot)}  \hat{θ}_{(i)})^2]^{3/2}} \;\;\;\;\;\; (49)
where the quantity \hat{θ}_{(i)} denotes the estimate of θ using all the values in \underline{x} except the i'th one, and
\hat{θ}{(\cdot)} = \frac{1}{N} ∑_{i=1}^N \hat{θ_{(i)}} \;\;\;\;\;\; (50)
A onesided lower confidence interval is given by:
[\hat{G}^{1}(α_1), \; ∞] \;\;\;\;\;\; (51)
and a onesided upper confidence interval is given by:
[0, \; \hat{G}^{1}(α_2)] \;\;\;\;\;\; (52)
where α_1 and α_2 are computed as for a twosided confidence interval, except α/2 is replaced with α in Equations (51) and (52).
The constant \hat{z}_0 incorporates the bias correction, and the constant \hat{a} is the acceleration constant. The term “acceleration” refers to the rate of change of the standard error of the estimate of θ with respect to the true value of θ (Efron and Tibshirani, 1993, p.186). For a normal (Gaussian) distribution, the standard error of the estimate of θ does not depend on the value of θ, hence the acceleration constant is not really necessary.
When ci.method="bootstrap"
, the function elnormAltCensored
computes both
the percentile method and biascorrected and accelerated method bootstrap confidence
intervals.
This method of constructing confidence intervals for censored data was studied by Shumway et al. (1989).
a list of class "estimateCensored"
containing the estimated parameters
and other information. See estimateCensored.object
for details.
A sample of data contains censored observations if some of the observations are reported only as being below or above some censoring level. In environmental data analysis, Type I leftcensored data sets are common, with values being reported as “less than the detection limit” (e.g., Helsel, 2012). Data sets with only one censoring level are called singly censored; data sets with multiple censoring levels are called multiply or progressively censored.
Statistical methods for dealing with censored data sets have a long history in the field of survival analysis and life testing. More recently, researchers in the environmental field have proposed alternative methods of computing estimates and confidence intervals in addition to the classical ones such as maximum likelihood estimation.
Helsel (2012, Chapter 6) gives an excellent review of past studies of the properties of various estimators based on censored environmental data.
In practice, it is better to use a confidence interval for the mean or a joint confidence region for the mean and standard deviation, rather than rely on a single pointestimate of the mean. Since confidence intervals and regions depend on the properties of the estimators for both the mean and standard deviation, the results of studies that simply evaluated the performance of the mean and standard deviation separately cannot be readily extrapolated to predict the performance of various methods of constructing confidence intervals and regions. Furthermore, for several of the methods that have been proposed to estimate the mean based on type I leftcensored data, standard errors of the estimates are not available, hence it is not possible to construct confidence intervals (ElShaarawi and Dolan, 1989).
Few studies have been done to evaluate the performance of methods for constructing confidence intervals for the mean or joint confidence regions for the mean and standard deviation on the original scale, not the logscale, when data are subjected to single or multiple censoring. See, for example, Singh et al. (2006).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Bain, L.J., and M. Engelhardt. (1991). Statistical Analysis of Reliability and LifeTesting Models. Marcel Dekker, New York, 496pp.
Cohen, A.C. (1959). Simplified Estimators for the Normal Distribution When Samples are Singly Censored or Truncated. Technometrics 1(3), 217–237.
Cohen, A.C. (1963). Progressively Censored Samples in Life Testing. Technometrics 5, 327–339
Cohen, A.C. (1991). Truncated and Censored Samples. Marcel Dekker, New York, New York, 312pp.
Cox, D.R. (1970). Analysis of Binary Data. Chapman & Hall, London. 142pp.
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1–26.
Efron, B., and R.J. Tibshirani. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, 436pp.
ElShaarawi, A.H. (1989). Inferences About the Mean from Censored Water Quality Data. Water Resources Research 25(4) 685–690.
ElShaarawi, A.H., and D.M. Dolan. (1989). Maximum Likelihood Estimation of Water Quality Concentrations from Censored Data. Canadian Journal of Fisheries and Aquatic Sciences 46, 1033–1039.
ElShaarawi, A.H., and S.R. Esterby. (1992). Replacement of Censored Observations by a Constant: An Evaluation. Water Research 26(6), 835–844.
ElShaarawi, A.H., and A. Naderi. (1991). Statistical Inference from Multiply Censored Environmental Data. Environmental Monitoring and Assessment 17, 339–347.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.
Gleit, A. (1985). Estimation for Small Normal Data Sets with Detection Limits. Environmental Science and Technology 19, 1201–1206.
Haas, C.N., and P.A. Scheff. (1990). Estimation of Averages in Truncated Samples. Environmental Science and Technology 24(6), 912–919.
Hashimoto, L.K., and R.R. Trussell. (1983). Evaluating Water Quality Data Near the Detection Limit. Paper presented at the Advanced Technology Conference, American Water Works Association, Las Vegas, Nevada, June 59, 1983.
Helsel, D.R. (1990). Less than Obvious: Statistical Treatment of Data Below the Detection Limit. Environmental Science and Technology 24(12), 1766–1774.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley \& Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997–2004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715–727.
Korn, L.R., and D.E. Tyler. (2001). Robust Estimation for Chemical Concentration Data Subject to Detection Limits. In Fernholz, L., S. Morgenthaler, and W. Stahel, eds. Statistics in Genetics and in the Environmental Sciences. Birkhauser Verlag, Basel, pp.41–63.
Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461–496.
Millard, S.P., P. Dixon, and N.K. Neerchal. (2014; in preparation). Environmental Statistics with R. CRC Press, Boca Raton, Florida.
Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York, 634pp.
Newman, M.C., P.M. Dixon, B.B. Looney, and J.E. Pinder. (1989). Estimating Mean and Variance for Environmental Samples with Below Detection Limit Observations. Water Resources Bulletin 25(4), 905–916.
Pettitt, A. N. (1983). ReWeighted Least Squares Estimation with Censored and Grouped Data: An Application of the EM Algorithm. Journal of the Royal Statistical Society, Series B 47, 253–260.
Regal, R. (1982). Applying Order Statistic Censored Normal Confidence Intervals to Time Censored Data. Unpublished manuscript, University of Minnesota, Duluth, Department of Mathematical Sciences.
Royston, P. (2007). Profile Likelihood for Estimation and Confdence Intervals. The Stata Journal 7(3), pp. 376–387.
Saw, J.G. (1961b). The Bias of the Maximum Likelihood Estimators of Location and Scale Parameters Given a Type II Censored Normal Sample. Biometrika 48, 448–451.
Schmee, J., D.Gladstein, and W. Nelson. (1985). Confidence Limits for Parameters of a Normal Distribution from Singly Censored Samples, Using Maximum Likelihood. Technometrics 27(2) 119–128.
Schneider, H. (1986). Truncated and Censored Samples from Normal Populations. Marcel Dekker, New York, New York, 273pp.
Shumway, R.H., A.S. Azari, and P. Johnson. (1989). Estimating Mean Concentrations Under Transformations for Environmental Data With Detection Limits. Technometrics 31(3), 347–356.
Singh, A., R. Maichle, and S. Lee. (2006). On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. EPA/600/R06/022, March 2006. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.
Stryhn, H., and J. Christensen. (2003). Confidence Intervals by the Profile Likelihood Method, with Applications in Veterinary Epidemiology. Contributed paper at ISVEE X (November 2003, Chile). http://people.upei.ca/hstryhn/stryhn208.pdf.
Travis, C.C., and M.L. Land. (1990). Estimating the Mean of Data Sets with Nondetectable Values. Environmental Science and Technology 24, 961–962.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet  March 2009 Unified Guidance. EPA 530/R09007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
Venzon, D.J., and S.H. Moolgavkar. (1988). A Method for Computing ProfileLikelihoodBased Confidence Intervals. Journal of the Royal Statistical Society, Series C (Applied Statistics) 37(1), pp. 87–94.
LognormalAlt
, elnormAlt
,
elnormCensored
, enormCensored
,
estimateCensored.object
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146  # Chapter 15 of USEPA (2009) gives several examples of estimating the mean
# and standard deviation of a lognormal distribution on the logscale using
# manganese concentrations (ppb) in groundwater at five background wells.
# In EnvStats these data are stored in the data frame
# EPA.09.Ex.15.1.manganese.df.
# Here we will estimate the mean and coefficient of variation
# ON THE ORIGINAL SCALE using the MLE, QMVUE,
# and robust ROS (imputation with QQ regression).
# First look at the data:
#
EPA.09.Ex.15.1.manganese.df
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#...
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
longToWide(EPA.09.Ex.15.1.manganese.df,
"Manganese.Orig.ppb", "Sample", "Well",
paste.row.name = TRUE)
# Well.1 Well.2 Well.3 Well.4 Well.5
#Sample.1 <5 <5 <5 6.3 17.9
#Sample.2 12.1 7.7 5.3 11.9 22.7
#Sample.3 16.9 53.6 12.6 10 3.3
#Sample.4 21.6 9.5 106.3 <2 8.4
#Sample.5 <2 45.9 34.5 77.2 <2
# Now estimate the mean and coefficient of variation
# using the MLE:
#
with(EPA.09.Ex.15.1.manganese.df,
elnormAltCensored(Manganese.ppb, Censored))
#Results of Distribution Parameter Estimation
#Based on Type I Censored Data
#
#
#Assumed Distribution: Lognormal
#
#Censoring Side: left
#
#Censoring Level(s): 2 5
#
#Estimated Parameter(s): mean = 23.003987
# cv = 2.300772
#
#Estimation Method: MLE
#
#Data: Manganese.ppb
#
#Censoring Variable: Censored
#
#Sample Size: 25
#
#Percent Censored: 24%
# Now compare the MLE with the QMVUE and the
# estimator based on robust ROS
#
with(EPA.09.Ex.15.1.manganese.df,
elnormAltCensored(Manganese.ppb, Censored))$parameters
# mean cv
#23.003987 2.300772
with(EPA.09.Ex.15.1.manganese.df,
elnormAltCensored(Manganese.ppb, Censored,
method = "qmvue"))$parameters
# mean cv
#21.566945 1.841366
with(EPA.09.Ex.15.1.manganese.df,
elnormAltCensored(Manganese.ppb, Censored,
method = "rROS"))$parameters
# mean cv
#19.886180 1.298868
#
# The method used to estimate quantiles for a QQ plot is
# determined by the argument prob.method. For the function
# elnormCensoredAlt, for any estimation method that involves
# QQ regression, the default value of prob.method is
# "hirschstedinger" and the default value for the
# plotting position constant is plot.pos.con=0.375.
# Both Helsel (2012) and USEPA (2009) also use the HirschStedinger
# probability method but set the plotting position constant to 0.
with(EPA.09.Ex.15.1.manganese.df,
elnormAltCensored(Manganese.ppb, Censored,
method = "rROS", plot.pos.con = 0))$parameters
# mean cv
#19.827673 1.304725
#
# Using the same data as above, compute a confidence interval
# for the mean using the profilelikelihood method.
with(EPA.09.Ex.15.1.manganese.df,
elnormAltCensored(Manganese.ppb, Censored, ci = TRUE))
#Results of Distribution Parameter Estimation
#Based on Type I Censored Data
#
#
#Assumed Distribution: Lognormal
#
#Censoring Side: left
#
#Censoring Level(s): 2 5
#
#Estimated Parameter(s): mean = 23.003987
# cv = 2.300772
#
#Estimation Method: MLE
#
#Data: Manganese.ppb
#
#Censoring Variable: Censored
#
#Sample Size: 25
#
#Percent Censored: 24%
#
#Confidence Interval for: mean
#
#Confidence Interval Method: Profile Likelihood
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 12.37629
# UCL = 69.87694

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.