Description Usage Arguments Details Value Note Author(s) References See Also Examples
Estimate the mean and standard deviation parameters of the logarithm of a lognormal distribution given a sample of data that has been subjected to Type I censoring, and optionally construct a confidence interval for the mean.
1 2 3 4  elnormCensored(x, censored, method = "mle", censoring.side = "left",
ci = FALSE, ci.method = "profile.likelihood", ci.type = "twosided",
conf.level = 0.95, n.bootstraps = 1000, pivot.statistic = "z",
nmc = 1000, seed = NULL, ...)

x 
numeric vector of observations. Missing ( 
censored 
numeric or logical vector indicating which values of 
method 
character string specifying the method of estimation. For singly censored data, the possible values are: For multiply censored data, the possible values are: See the DETAILS section for more information. 
censoring.side 
character string indicating on which side the censoring occurs. The possible
values are 
ci 
logical scalar indicating whether to compute a confidence interval for the
mean or variance. The default value is 
ci.method 
character string indicating what method to use to construct the confidence interval
for the mean. The possible values are: See the DETAILS section for more information.
This argument is ignored if 
ci.type 
character string indicating what kind of confidence interval to compute. The
possible values are 
conf.level 
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is 
n.bootstraps 
numeric scalar indicating how many bootstraps to use to construct the
confidence interval for the mean when 
pivot.statistic 
character string indicating which pivot statistic to use in the construction
of the confidence interval for the mean when 
nmc 
numeric scalar indicating the number of Monte Carlo simulations to run when

seed 
integer supplied to the function 
... 
additional arguments to pass to other functions.

If x
or censored
contain any missing (NA
), undefined (NaN
) or
infinite (Inf
, Inf
) values, they will be removed prior to
performing the estimation.
Let X denote a random variable with a
lognormal distribution with
parameters meanlog=
μ and sdlog=
σ. Then
Y = log(X) has a normal (Gaussian) distribution with
parameters mean=
μ and sd=
σ. Thus, the function
elnormCensored
simply calls the function enormCensored
using the
logtransformed values of x
.
a list of class "estimateCensored"
containing the estimated parameters
and other information. See estimateCensored.object
for details.
A sample of data contains censored observations if some of the observations are reported only as being below or above some censoring level. In environmental data analysis, Type I leftcensored data sets are common, with values being reported as “less than the detection limit” (e.g., Helsel, 2012). Data sets with only one censoring level are called singly censored; data sets with multiple censoring levels are called multiply or progressively censored.
Statistical methods for dealing with censored data sets have a long history in the field of survival analysis and life testing. More recently, researchers in the environmental field have proposed alternative methods of computing estimates and confidence intervals in addition to the classical ones such as maximum likelihood estimation.
Helsel (2012, Chapter 6) gives an excellent review of past studies of the properties of various estimators based on censored environmental data.
In practice, it is better to use a confidence interval for the mean or a joint confidence region for the mean and standard deviation, rather than rely on a single pointestimate of the mean. Since confidence intervals and regions depend on the properties of the estimators for both the mean and standard deviation, the results of studies that simply evaluated the performance of the mean and standard deviation separately cannot be readily extrapolated to predict the performance of various methods of constructing confidence intervals and regions. Furthermore, for several of the methods that have been proposed to estimate the mean based on type I leftcensored data, standard errors of the estimates are not available, hence it is not possible to construct confidence intervals (ElShaarawi and Dolan, 1989).
Few studies have been done to evaluate the performance of methods for constructing confidence intervals for the mean or joint confidence regions for the mean and standard deviation when data are subjected to single or multiple censoring. See, for example, Singh et al. (2006).
Schmee et al. (1985) studied Type II censoring for a normal distribution and
noted that the bias and variances of the maximum likelihood estimators are of the
order 1/N, and that the bias is negligible for N=100 and as much as
90% censoring. (If the proportion of censored observations is less than 90%,
the bias becomes negligible for smaller sample sizes.) For small samples with
moderate to high censoring, however, the bias of the mle's causes confidence
intervals based on them using a normal approximation (e.g., method="mle"
and ci.method="normal.approx"
) to be too short. Schmee et al. (1985)
provide tables for exact confidence intervals for sample sizes up to N=100
that were created based on Monte Carlo simulation. Schmee et al. (1985) state
that these tables should work well for Type I censored data as well.
Shumway et al. (1989) evaluated the coverage of 90% confidence intervals for the mean based on using a BoxCox transformation to induce normality, computing the mle's based on the normal distribution, then computing the mean in the original scale. They considered three methods of constructing confidence intervals: the delta method, the bootstrap, and the biascorrected bootstrap. Shumway et al. (1989) used three parent distributions in their study: Normal(3,1), the square of this distribuiton, and the exponentiation of this distribution (i.e., a lognormal distribution). Based on sample sizes of 10 and 50 with a censoring level at the 10'th or 20'th percentile, Shumway et al. (1989) found that the delta method performed quite well and was superior to the bootstrap method.
Millard et al. (2014; in preparation) show that the coverage of profile likelihood method is excellent.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Bain, L.J., and M. Engelhardt. (1991). Statistical Analysis of Reliability and LifeTesting Models. Marcel Dekker, New York, 496pp.
Cohen, A.C. (1959). Simplified Estimators for the Normal Distribution When Samples are Singly Censored or Truncated. Technometrics 1(3), 217–237.
Cohen, A.C. (1963). Progressively Censored Samples in Life Testing. Technometrics 5, 327–339
Cohen, A.C. (1991). Truncated and Censored Samples. Marcel Dekker, New York, New York, 312pp.
Cox, D.R. (1970). Analysis of Binary Data. Chapman & Hall, London. 142pp.
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1–26.
Efron, B., and R.J. Tibshirani. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, 436pp.
ElShaarawi, A.H. (1989). Inferences About the Mean from Censored Water Quality Data. Water Resources Research 25(4) 685–690.
ElShaarawi, A.H., and D.M. Dolan. (1989). Maximum Likelihood Estimation of Water Quality Concentrations from Censored Data. Canadian Journal of Fisheries and Aquatic Sciences 46, 1033–1039.
ElShaarawi, A.H., and S.R. Esterby. (1992). Replacement of Censored Observations by a Constant: An Evaluation. Water Research 26(6), 835–844.
ElShaarawi, A.H., and A. Naderi. (1991). Statistical Inference from Multiply Censored Environmental Data. Environmental Monitoring and Assessment 17, 339–347.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.
Gleit, A. (1985). Estimation for Small Normal Data Sets with Detection Limits. Environmental Science and Technology 19, 1201–1206.
Haas, C.N., and P.A. Scheff. (1990). Estimation of Averages in Truncated Samples. Environmental Science and Technology 24(6), 912–919.
Hashimoto, L.K., and R.R. Trussell. (1983). Evaluating Water Quality Data Near the Detection Limit. Paper presented at the Advanced Technology Conference, American Water Works Association, Las Vegas, Nevada, June 59, 1983.
Helsel, D.R. (1990). Less than Obvious: Statistical Treatment of Data Below the Detection Limit. Environmental Science and Technology 24(12), 1766–1774.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley \& Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997–2004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715–727.
Korn, L.R., and D.E. Tyler. (2001). Robust Estimation for Chemical Concentration Data Subject to Detection Limits. In Fernholz, L., S. Morgenthaler, and W. Stahel, eds. Statistics in Genetics and in the Environmental Sciences. Birkhauser Verlag, Basel, pp.41–63.
Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461–496.
Millard, S.P., P. Dixon, and N.K. Neerchal. (2014; in preparation). Environmental Statistics with R. CRC Press, Boca Raton, Florida.
Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York, 634pp.
Newman, M.C., P.M. Dixon, B.B. Looney, and J.E. Pinder. (1989). Estimating Mean and Variance for Environmental Samples with Below Detection Limit Observations. Water Resources Bulletin 25(4), 905–916.
Pettitt, A. N. (1983). ReWeighted Least Squares Estimation with Censored and Grouped Data: An Application of the EM Algorithm. Journal of the Royal Statistical Society, Series B 47, 253–260.
Regal, R. (1982). Applying Order Statistic Censored Normal Confidence Intervals to Time Censored Data. Unpublished manuscript, University of Minnesota, Duluth, Department of Mathematical Sciences.
Royston, P. (2007). Profile Likelihood for Estimation and Confdence Intervals. The Stata Journal 7(3), pp. 376–387.
Saw, J.G. (1961b). The Bias of the Maximum Likelihood Estimators of Location and Scale Parameters Given a Type II Censored Normal Sample. Biometrika 48, 448–451.
Schmee, J., D.Gladstein, and W. Nelson. (1985). Confidence Limits for Parameters of a Normal Distribution from Singly Censored Samples, Using Maximum Likelihood. Technometrics 27(2) 119–128.
Schneider, H. (1986). Truncated and Censored Samples from Normal Populations. Marcel Dekker, New York, New York, 273pp.
Shumway, R.H., A.S. Azari, and P. Johnson. (1989). Estimating Mean Concentrations Under Transformations for Environmental Data With Detection Limits. Technometrics 31(3), 347–356.
Singh, A., R. Maichle, and S. Lee. (2006). On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. EPA/600/R06/022, March 2006. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.
Stryhn, H., and J. Christensen. (2003). Confidence Intervals by the Profile Likelihood Method, with Applications in Veterinary Epidemiology. Contributed paper at ISVEE X (November 2003, Chile). http://people.upei.ca/hstryhn/stryhn208.pdf.
Travis, C.C., and M.L. Land. (1990). Estimating the Mean of Data Sets with Nondetectable Values. Environmental Science and Technology 24, 961–962.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet  March 2009 Unified Guidance. EPA 530/R09007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
Venzon, D.J., and S.H. Moolgavkar. (1988). A Method for Computing ProfileLikelihoodBased Confidence Intervals. Journal of the Royal Statistical Society, Series C (Applied Statistics) 37(1), pp. 87–94.
enormCensored
, Lognormal, elnorm
,
estimateCensored.object
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148  # Chapter 15 of USEPA (2009) gives several examples of estimating the mean
# and standard deviation of a lognormal distribution on the logscale using
# manganese concentrations (ppb) in groundwater at five background wells.
# In EnvStats these data are stored in the data frame
# EPA.09.Ex.15.1.manganese.df.
# Here we will estimate the mean and standard deviation using the MLE,
# QQ regression (also called parametric regression on order statistics
# or ROS; e.g., USEPA, 2009 and Helsel, 2012), and imputation with QQ
# regression (also called robust ROS or rROS).
# First look at the data:
#
EPA.09.Ex.15.1.manganese.df
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#...
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
longToWide(EPA.09.Ex.15.1.manganese.df,
"Manganese.Orig.ppb", "Sample", "Well",
paste.row.name = TRUE)
# Well.1 Well.2 Well.3 Well.4 Well.5
#Sample.1 <5 <5 <5 6.3 17.9
#Sample.2 12.1 7.7 5.3 11.9 22.7
#Sample.3 16.9 53.6 12.6 10 3.3
#Sample.4 21.6 9.5 106.3 <2 8.4
#Sample.5 <2 45.9 34.5 77.2 <2
# Now estimate the mean and standard deviation on the logscale
# using the MLE:
#
with(EPA.09.Ex.15.1.manganese.df,
elnormCensored(Manganese.ppb, Censored))
#Results of Distribution Parameter Estimation
#Based on Type I Censored Data
#
#
#Assumed Distribution: Lognormal
#
#Censoring Side: left
#
#Censoring Level(s): 2 5
#
#Estimated Parameter(s): meanlog = 2.215905
# sdlog = 1.356291
#
#Estimation Method: MLE
#
#Data: Manganese.ppb
#
#Censoring Variable: Censored
#
#Sample Size: 25
#
#Percent Censored: 24%
# Now compare the MLE with the estimators based on
# QQ regression (ROS) and imputation with QQ regression (rROS)
#
with(EPA.09.Ex.15.1.manganese.df,
elnormCensored(Manganese.ppb, Censored))$parameters
# meanlog sdlog
#2.215905 1.356291
with(EPA.09.Ex.15.1.manganese.df,
elnormCensored(Manganese.ppb, Censored,
method = "ROS"))$parameters
# meanlog sdlog
#2.293742 1.283635
with(EPA.09.Ex.15.1.manganese.df,
elnormCensored(Manganese.ppb, Censored,
method = "rROS"))$parameters
# meanlog sdlog
#2.298656 1.238104
#
# The method used to estimate quantiles for a QQ plot is
# determined by the argument prob.method. For the functions
# enormCensored and elnormCensored, for any estimation
# method that involves QQ regression, the default value of
# prob.method is "hirschstedinger" and the default value for the
# plotting position constant is plot.pos.con=0.375.
# Both Helsel (2012) and USEPA (2009) also use the HirschStedinger
# probability method but set the plotting position constant to 0.
with(EPA.09.Ex.15.1.manganese.df,
elnormCensored(Manganese.ppb, Censored,
method = "rROS", plot.pos.con = 0))$parameters
# meanlog sdlog
#2.277175 1.261431
#
# Using the same data as above, compute a confidence interval
# for the mean on the logscale using the profilelikelihood
# method.
with(EPA.09.Ex.15.1.manganese.df,
elnormCensored(Manganese.ppb, Censored, ci = TRUE))
#Results of Distribution Parameter Estimation
#Based on Type I Censored Data
#
#
#Assumed Distribution: Lognormal
#
#Censoring Side: left
#
#Censoring Level(s): 2 5
#
#Estimated Parameter(s): meanlog = 2.215905
# sdlog = 1.356291
#
#Estimation Method: MLE
#
#Data: Manganese.ppb
#
#Censoring Variable: Censored
#
#Sample Size: 25
#
#Percent Censored: 24%
#
#Confidence Interval for: meanlog
#
#Confidence Interval Method: Profile Likelihood
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 1.595062
# UCL = 2.771197

Attaching package: 'EnvStats'
The following objects are masked from 'package:stats':
predict, predict.lm
The following object is masked from 'package:base':
print.default
Sample Well Manganese.Orig.ppb Manganese.ppb Censored
1 1 Well.1 <5 5.0 TRUE
2 2 Well.1 12.1 12.1 FALSE
3 3 Well.1 16.9 16.9 FALSE
4 4 Well.1 21.6 21.6 FALSE
5 5 Well.1 <2 2.0 TRUE
6 1 Well.2 <5 5.0 TRUE
7 2 Well.2 7.7 7.7 FALSE
8 3 Well.2 53.6 53.6 FALSE
9 4 Well.2 9.5 9.5 FALSE
10 5 Well.2 45.9 45.9 FALSE
11 1 Well.3 <5 5.0 TRUE
12 2 Well.3 5.3 5.3 FALSE
13 3 Well.3 12.6 12.6 FALSE
14 4 Well.3 106.3 106.3 FALSE
15 5 Well.3 34.5 34.5 FALSE
16 1 Well.4 6.3 6.3 FALSE
17 2 Well.4 11.9 11.9 FALSE
18 3 Well.4 10 10.0 FALSE
19 4 Well.4 <2 2.0 TRUE
20 5 Well.4 77.2 77.2 FALSE
21 1 Well.5 17.9 17.9 FALSE
22 2 Well.5 22.7 22.7 FALSE
23 3 Well.5 3.3 3.3 FALSE
24 4 Well.5 8.4 8.4 FALSE
25 5 Well.5 <2 2.0 TRUE
Well.1 Well.2 Well.3 Well.4 Well.5
Sample.1 <5 <5 <5 6.3 17.9
Sample.2 12.1 7.7 5.3 11.9 22.7
Sample.3 16.9 53.6 12.6 10 3.3
Sample.4 21.6 9.5 106.3 <2 8.4
Sample.5 <2 45.9 34.5 77.2 <2
Results of Distribution Parameter Estimation
Based on Type I Censored Data

Assumed Distribution: Lognormal
Censoring Side: left
Censoring Level(s): 2 5
Estimated Parameter(s): meanlog = 2.215905
sdlog = 1.356291
Estimation Method: MLE
Data: Manganese.ppb
Censoring Variable: Censored
Sample Size: 25
Percent Censored: 24%
meanlog sdlog
2.215905 1.356291
meanlog sdlog
2.293742 1.283635
meanlog sdlog
2.298656 1.238104
meanlog sdlog
2.277175 1.261431
Results of Distribution Parameter Estimation
Based on Type I Censored Data

Assumed Distribution: Lognormal
Censoring Side: left
Censoring Level(s): 2 5
Estimated Parameter(s): meanlog = 2.215905
sdlog = 1.356291
Estimation Method: MLE
Data: Manganese.ppb
Censoring Variable: Censored
Sample Size: 25
Percent Censored: 24%
Confidence Interval for: meanlog
Confidence Interval Method: Profile Likelihood
Confidence Interval Type: twosided
Confidence Level: 95%
Confidence Interval: LCL = 1.595062
UCL = 2.771197
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.