eqnpar: Estimate Quantiles of a Distribution Nonparametrically

eqnparR Documentation

Estimate Quantiles of a Distribution Nonparametrically

Description

Estimate quantiles of a distribution, and optionally create confidence intervals for them, without making any assumptions about the form of the distribution.

Usage

  eqnpar(x, p = 0.5, type = 7, ci = FALSE, lcl.rank = NULL, ucl.rank = NULL,
    lb = -Inf, ub = Inf, ci.type = "two-sided",
    ci.method = "interpolate", digits = getOption("digits"),
    approx.conf.level = 0.95, min.coverage = TRUE, tol = 0)

Arguments

x

a numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

p

numeric vector of probabilities for which quantiles will be estimated. All values of p must be between 0 and 1. When ci=TRUE, p must be a scalar. The default value is p=0.5.

type

an integer between 1 and 9 indicating which algorithm to use to estimate the quantile. The default value is type=7. See the help file for quantile for details.

ci

logical scalar indicating whether to compute a confidence interval for the quantile. The default value is ci=FALSE.

lcl.rank, ucl.rank

positive integers indicating the ranks of the order statistics that are used for the lower and upper bounds of the confidence interval for the specified quantile. Both arguments must be integers between 1 and the number of non-missing values in x, and lcl.rank must be strictly less than ucl.rank. Setting values for lcl.rank and/or ucl.rank allows the user to bypass the automatic selection of order statistics. By default the value of these arguments is NULL, in which case order statistics are chosen based on the value of ci.type and ci.method. If only lcl.rank is supplied, a lower confidence interval is constructed. If only ucl.rank is supplied, an upper confidence interval is constructed. If both lcl.rank and ucl.rank are supplied, a two-sided confidence interval is constructed. These arguments are ignored if ci=FALSE.

lb, ub

scalars indicating lower and upper bounds on the distribution. By default,
lb=-Inf and ub=Inf. If you are constructing a confidence interval for a quantile from a distribution that you know has a lower bound other than -Inf (e.g., 0), set lb to this value. Similarly, if you know the distribution has an upper bound other than Inf, set ub to this value. These arguments are ignored if ci=FALSE.

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE, or lcl.rank and/or ucl.rank are supplied.

ci.method

character string indicating the method to use to construct the confidence interval. The possible values are "interpolate" (the default), "exact", and "normal.approx". See the DETAILS section for more information on these methods. This argument is ignored if ci=FALSE, or lcl.rank and/or ucl.rank are supplied.

digits

an integer indicating the number of decimal places to round to when printing out the value of 100*p. The default value is digits=0.

approx.conf.level

a scalar between 0 and 1 indicating the desired confidence level of the confidence interval. The default value is 0.95. The true confidence level usually will not be exactly equal to approx.conf.level (see DETAILS). This argument is ignored if ci=FALSE, or lcl.rank and/or ucl.rank are supplied.

min.coverage

for the case when ci.method="exact", a logical scalar indicating whether the confidence interval should have a minimum coverage at least as great as the value of the argument approx.conf.level. The default value is
min.coverage=TRUE. This argument is ignored if ci=FALSE or ci.method is not equal to "exact".

tol

for the case when ci.method="exact" and min.coverage=FALSE, a scalar between 0 and 1 indicating the maximum amount of coverage greater than the value of approx.conf.level the user is willing to allow. The default value is tol=0.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Estimation
The function eqnpar calls the R function quantile to estimate quantiles.

Confidence Intervals
Let x_1, x_2, \ldots, x_n denote a sample of n independent and identically distributed random variables from some arbitrary distribution. Furthermore, let x_{(i)} denote the i'th order statistic for these n random variables. That is,

x_{(1)} \le x_{(2)} \le \ldots \le x_{(n)} \;\;\;\;\;\; (1)

Finally, let x_p denote the p'th quantile of the distribution, that is:

Pr(X < x_p) \le p \;\;\;\;\;\; (2)

Pr(X \le x_p) \ge p \;\;\;\;\;\; (3)

It can be shown (e.g., Conover, 1980, pp. 114-116) that for the i'th order statistic:

Pr[x_p < x_{(i)}] = F_{B(n,p)}[i-1]; \; i = 1, 2, \ldots, n \;\;\;\;\;\; (4)

for a continuous distribution and

Pr[x_p < x_{(i)}] \le F_{B(n,p)}[i-1]; \; i = 1, 2, \ldots, n \;\;\;\;\;\; (5)

Pr[x_p \le x_{(i)}] \ge F_{B(n,p)}[i-1]; \; i = 1, 2, \ldots, n \;\;\;\;\;\; (6)

for a discrete distribution, where F_{B(n,p)}[y] denotes the cumulative distribution function of a binomial random variable with parameters size=n and prob=p evaluated at y. These facts are used to construct confidence intervals for quantiles (see below).

Two-Sided Confidence Interval (ci.type="two-sided")
A two-sided nonparametric confidence interval for the p'th quantile is constructed as:

[x_{(r)}, x_{(s)}] \;\;\;\;\;\; (7)

where

1 \le r \le (n-1) \;\;\;\;\;\; (8)

2 \le s \le n \;\;\;\;\;\; (9)

r < s \;\;\;\;\;\; (10)

Note that the argument lcl.rank corresponds to r, and the argument ucl.rank corresponds to s.

This confidence interval has an associated confidence level that is at least as large as:

F_{B(n,p)}[s-1] - F_{B(n,p)}[r-1] \;\;\;\;\;\; (11)

for a discrete distribution and exactly equal to this value for a continuous distribution. This is because by Equations (4)-(6) above:

Pr[x_{(r)} \le x_p \le x_{(s)}]

= Pr[x_p \le x_{(s)}] - Pr[x_p < x_{(r)}]

\ge F_{B(n,p)}[s-1] - F_{B(n,p)}[r-1] \;\;\;\;\;\; (12)

with equality if the distribution is continuous.

Exact Method (ci.method="exact")
When lcl.rank (r) and ucl.rank (s) are not supplied by the user, and ci.method="exact", r and s are initially chosen such that r is the smallest integer satisfying equation (13) below, and s is the largest integer satisfying equation (14) below:

F_{B(n,p)}[r-1] \ge \frac{\alpha}{2} \;\;\;\;\;\; (13)

F_{B(n,p)}[s-1] \le 1-\frac{\alpha}{2} \;\;\;\;\;\; (14)

where \alpha = 1 -approx.conf.level. The values of r and s are then each varied by \pm 2 (with the restrictions r \ge 1, s \le n, and r < s), and confidence levels computed for each of these combinations. If min.coverage=TRUE, the combination of r and s is selected that provides the closest coverage to approx.conf.level, with coverage greater than or equal to approx.conf.level. If min.coverage=FALSE, the combination of r and s is selected that provides the closest coverage to approx.conf.level, with coverage less than or equal to approx.conf.level + tol.

For this method, the confidence level associated with the confidence interval is exact if the underlying distribution is continuous.

Approximate Method (ci.method="approx")
Here the term “Approximate” simply refers to the method of initially choosing the ranks for the lower and upper bounds. As for ci.method="exact", the confidence level associated with the confidence interval is exact if the underlying distribution is continuous.

When lcl.rank (r) and ucl.rank (s) are not supplied by the user and ci.method="normal.approx", r and s are initially chosen such that:

r = np - h \;\;\;\;\;\; (15)

s = np + h \;\;\;\;\;\; (16)

where

h = t_{n-1, 1-\alpha/2} \sqrt{np(1-p)} \;\;\;\;\;\; (17)

and t_{\nu,q} denotes the q'th quantile of Student's t-distribution with \nu degrees of freedom, and \alpha = 1 -approx.conf.level (Conover, 1980, p. 112). With the restictions that r \ge 1 and s \le n, r is rounded down to the nearest integer r = r^* = floor(r) and s is rounded up to the nearest integer s = s^* = ceiling(s). Again, with the restictions that s \le n, if the confidence level using s = s^* + 1 and r = r^* is less than or equal to approx.conf.level, then s is set to s = s^* + 1. Once this has been checked, with the restriction that r \ge 1, if the confidence level using the current value of s and r = r^* - 1 is less than or equal to approx.conf.level, then r is set to r = r^* - 1.

Interpolate Method (ci.method="interpolate")
Let \gamma denote the desired confidence level associated with the confidence interval for the p'th quantile. Based on the work of Hettmansperger and Sheather (1986), Nyblom (1992) showed that if [-\infty, x_{(w+1)}] is a one-sided upper confidence interval for the p'th quantile with associated confidence level \gamma_{w+1}, and \gamma_{w+1} \ge \gamma \ge \gamma_{w}, then the one-sided upper confidence interval

[-\infty, (1-\lambda) x_{(w)} + \lambda x_{(w+1)}] \;\;\;\;\;\; (18)

where

\lambda = \lambda(\beta, p, w, n)

= [ 1 + \frac{w(1-p)(\pi_{w+1}-\beta)}{(n-w)p(\beta-\pi_w)} ] ^ {-1} \;\;\;\;\;\; (19)

\pi_w = F_{B(n,p)}[w-1] \;\;\;\;\;\; (20)

\beta = \gamma \;\;\;\;\;\; (21)

has associated confidence level approximately equal to \gamma for a wide range of distributions.

Thus, to construct an approximate two-sided confidence interval for the p'th quantile with confidence level \gamma, if [x_{(r)}, x_{(s)}] has confidence level \ge \gamma and [x_{(r+1)}, x_{(s-1)}] has confidence level \le \gamma, then the lower bound of the two-sided confidence interval is computed as:

(1-\lambda) x_{(r)} + \lambda x_{(r+1)} \;\;\;\;\;\; (21)

where

\beta = \alpha/2; \;\;\;\; \alpha = 1 - \gamma \;\;\;\;\;\; (22)

and the upper bound of the two-sided confidence interval is computed as:

(1-\lambda) x_{(s-1)} + \lambda x_{(s)} \;\;\;\;\;\; (23)

where

\beta = 1 - \alpha/2 \;\;\;\;\;\; (24)

The values of r and s in Equations (21) and (23) are computed by using ci.method="exact" with the argument min.coverage=TRUE.

One-Sided Lower Confidence Interval (ci.type="lower")
A one-sided lower nonparametric confidence interval for the p'th quantile is constructed as:

[x_{(r)}, ub] \;\;\;\;\;\; (25)

where ub denotes the value of the ub argument (the user-supplied upper bound).

Exact Method (ci.method="exact")
When lcl.rank (r) is not supplied by the user, and ci.method="exact", r is initially chosen such that it is the smallest integer satisfying the following equation:

F_{B(n,p)}[r-1] \ge \alpha \;\;\;\;\;\; (26)

where \alpha = 1 - approx.conf.level. The value of r is varied by \pm 2 (with the restrictions r \ge 1 and r \le n), and confidence levels computed for each of these combinations. If min.coverage=TRUE, the value of r is selected that provides the closest coverage to approx.conf.level, with coverage greater than or equal to approx.conf.level. If min.coverage=FALSE, the value of r is selected that provides the closest coverage to approx.conf.level, with coverage less than or equal to approx.conf.level + tol.

For this method, the confidence level associated with the confidence interval is exact if the underlying distribution is continuous.

Approximate Method (ci.method="approx")
When lcl.rank (r) is not supplied by the user and ci.method="normal.approx", r is initially chosen such that

r = np - t_{n-1, 1-\alpha} \sqrt{np(1-p)} \;\;\;\;\;\; (27)

With the restriction that r \ge 1 and r \le n, if p is less than 0.5 then r is rounded up to the nearest integer, otherwise it is rounded down to the nearest integer. Denote this value by r^*. With the restriction that r \ge 1, if the confidence level using r^* - 1 is less than or equal to approx.conf.level, then r is set to r = r^* - 1.

Interpolate Method (ci.method="interpolate")
Let \gamma denote the desired confidence level associated with the confidence interval for the p'th quantile. To construct an approximate one-sided lower confidence interval for the p'th quantile with confidence level \gamma, if [x_{(r)}, ub] has confidence level \ge \gamma and [x_{(r+1)}, ub] has confidence level \le \gamma, then the lower bound of the confidence interval is computed as:

(1-\lambda) x_{(r)} + \lambda x_{(r+1)} \;\;\;\;\;\; (28)

where

\beta = \alpha; \;\;\;\; \alpha = 1 - \gamma \;\;\;\;\;\; (29)

The value of r in Equation (28) is computed by using ci.method="exact" with the arguments ci.type="lower" and min.coverage=TRUE.

One-Sided Upper Confidence Interval (ci.type="upper")
A one-sided upper nonparametric confidence interval for the p'th quantile is constructed as:

[lb, x_{(s)}] \;\;\;\;\;\; (30)

where lb denotes the value of the lb argument (the user-supplied lower bound).

Exact Method (ci.method="exact")
When ucl.rank (s) is not supplied by the user, and ci.method="exact", s is initially chosen such that it is the largest integer satisfying the following equation:

F_{B(n,p)}[s-1] \le 1-\alpha \;\;\;\;\;\; (31)

where \alpha = 1 - approx.conf.level. The value of s is varied by \pm 2 (with the restrictions s \ge 1 and s \le n), and confidence levels computed for each of these combinations. If min.coverage=TRUE, the value of s is selected that provides the closest coverage to approx.conf.level, with coverage greater than or equal to approx.conf.level. If min.coverage=FALSE, the value of s is selected that provides the closest coverage to approx.conf.level, with coverage less than or equal to approx.conf.level + tol.

For this method, the confidence level associated with the confidence interval is exact if the underlying distribution is continuous.

Approximate Method (ci.method="approx")
When ucl.rank (s) is not supplied by the user and ci.method="normal.approx", s is initially chosen such that

s = np + t_{n-1, 1-\alpha} \sqrt{np(1-p)} \;\;\;\;\;\; (31)

With the restriction that s \ge 1 and s \le n, if p is greater than 0.5 then s is rounded down to the nearest integer, otherwise it is rounded up to the nearest integer. Denote this value by s^*. With the restriction that s \le n, if the confidence level using s^* + 1 is less than or equal to approx.conf.level, then s is set to s = s^* + 1.

For this method, the confidence level associated with the confidence interval is exact if the underlying distribution is continuous.

Interpolate Method (ci.method="interpolate")
Let \gamma denote the desired confidence level associated with the confidence interval for the p'th quantile. To construct an approximate one-sided upper confidence interval for the p'th quantile with confidence level \gamma, if [lb, x_{(s)}] has confidence level \ge \gamma and [lb, x_{(s-1)}] has confidence level \le \gamma, then the upper bound of the confidence interval is computed as:

(1-\lambda) x_{(s-1)} + \lambda x_{(s)} \;\;\;\;\;\; (32)

where

\beta = \gamma \;\;\;\;\;\; (33)

The value of s in Equation (32) is computed by using ci.method="exact" with the arguments ci.type = "upper", and min.coverage=TRUE.

Note on Value of Confidence Level
Because of the discrete nature of order statistics, when ci.method="exact" or
ci.method="normal.approx", the value of the confidence level returned by eqnpar will usually differ from the desired confidence level indicated by the value of the argument approx.conf.level. When
ci.method="interpolate", eqnpar returns for the confidence level the value of the argument approx.conf.level. Nyblom (1992) and Hettmasperger and Sheather (1986) have shown that the Interpolate method produces confidence intervals with confidence levels quite close to the assumed confidence level for a wide range of distributions.

Value

a list of class "estimate" containing the estimated quantile(s) and other information. See
estimate.object for details.

Note

Percentiles are sometimes used in environmental standards and regulations. For example, Berthouex and Brown (2002, p.71) note that England has water quality limits based on the 90th and 95th percentiles of monitoring data not exceeding specified levels. They also note that the U.S. EPA has specifications for air quality monitoring, aquatic standards on toxic chemicals, and maximum daily limits for industrial effluents that are all based on percentiles. Given the importance of these quantities, it is essential to characterize the amount of uncertainty associated with the estimates of these quantities. This is done with confidence intervals.

It can be shown (e.g., Conover, 1980, pp.119-121) that an upper confidence interval for the p'th quantile with confidence level 100(1-\alpha)\% is equivalent to an upper \beta-content tolerance interval with coverage 100p\% and confidence level 100(1-\alpha)\%. Also, a lower confidence interval for the p'th quantile with confidence level 100(1-\alpha)\% is equivalent to a lower \beta-content tolerance interval with coverage 100(1-p)\% and confidence level 100(1-\alpha)\%. See the help file for tolIntNpar for more information on nonparametric tolerance intervals.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

The author is grateful to Michael H?hle, Department of Mathematics, Stockholm University (http://www2.math.su.se/~hoehle) for making me aware of the work of Nyblom (1992), and for suggesting improvements to the algorithm that was used in EnvStats Version 2.1.1 to construct a confidence interval when ci.method="exact".

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton.

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, pp.132-136.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.88-90.

Hettmansperger, T.P., and Sheather, S.J. (1986). Confidence Intervals Based on Interpolated Order Statistics. Statistics & Probability Letters, 4, 75–79.

Nyblom, J. (1992). Note on Interpolated Order Statistics. Statistics & Probability Letters, 14, 129–131.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

See Also

quantile, tolIntNpar, Estimating Distribution Quantiles, Tolerance Intervals, estimate.object.

Examples


  # The data frame ACE.13.TCE.df contains observations on
  # Trichloroethylene (TCE) concentrations (mg/L) at
  # 10 groundwater monitoring wells before and after remediation.
  #
  # Compute the median concentration for each period along with
  # a 95% confidence interval for the median.
  #
  # Before remediation:  20.3 [8.8, 35.9]
  # After  remediation:   2.5 [0.8,  5.9]

  with(ACE.13.TCE.df,
    eqnpar(TCE.mg.per.L[Period=="Before"], ci = TRUE))

  #Results of Distribution Parameter Estimation
  #--------------------------------------------

  #Assumed Distribution:            None

  #Estimated Quantile(s):           Median = 20.3

  #Quantile Estimation Method:      Nonparametric

  #Data:                            TCE.mg.per.L[Period == "Before"]

  #Sample Size:                     10

  #Confidence Interval for:         50'th %ile

  #Confidence Interval Method:      interpolate (Nyblom, 1992)

  #Confidence Interval Type:        two-sided

  #Confidence Level:                95%

  #Confidence Limit Rank(s):        2 9
  #                                 3 8

  #Confidence Interval:             LCL =  8.804775
  #                                 UCL = 35.874775

  #----------

  with(ACE.13.TCE.df, eqnpar(TCE.mg.per.L[Period=="After"], ci = TRUE))

  #Results of Distribution Parameter Estimation
  #--------------------------------------------

  #Assumed Distribution:            None

  #Estimated Quantile(s):           Median = 2.48

  #Quantile Estimation Method:      Nonparametric

  #Data:                            TCE.mg.per.L[Period == "After"]

  #Sample Size:                     10

  #Confidence Interval for:         50'th %ile

  #Confidence Interval Method:      interpolate (Nyblom, 1992)

  #Confidence Interval Type:        two-sided

  #Confidence Level:                95%

  #Confidence Limit Rank(s):        2 9
  #                                 3 8

  #Confidence Interval:             LCL = 0.7810901
  #                                 UCL = 5.8763063

  #==========

  # Generate 20 observations from a cauchy distribution with parameters
  # location=0, scale=1.  The true 75th percentile of this distribution is 1.
  # Use eqnpar to estimate the 75th percentile and construct a 90% confidence interval.
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250)
  dat <- rcauchy(20, location = 0, scale = 1)

  #-------------------------------------------------------
  # First, use the default method, ci.method="interpolate"
  #-------------------------------------------------------

  eqnpar(dat, p = 0.75, ci = TRUE, approx.conf.level = 0.9)

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           75'th %ile = 1.524903
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         75'th %ile
  #
  #Confidence Interval Method:      interpolate (Nyblom, 1992)
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                90%
  #
  #Confidence Limit Rank(s):        12 19
  #                                 13 18
  #
  #Confidence Interval:             LCL = 0.8191423
  #                                 UCL = 2.1215570

  #----------

  #-------------------------------------------------------------
  # Now use ci.method="exact".
  # Note that the returned confidence level is greater than 90%.
  #-------------------------------------------------------------

  eqnpar(dat, p = 0.75, ci = TRUE, approx.conf.level = 0.9,
    ci.method = "exact")

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           75'th %ile = 1.524903
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         75'th %ile
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                93.47622%
  #
  #Confidence Limit Rank(s):        12 19
  #
  #Confidence Interval:             LCL = 0.7494692
  #                                 UCL = 2.2156601

  #----------

  #----------------------------------------------------------
  # Now use ci.method="exact" with min.coverage=FALSE.
  # Note that the returned confidence level is less than 90%.
  #----------------------------------------------------------

  eqnpar(dat, p = 0.75, ci = TRUE, approx.conf.level = 0.9,
    ci.method = "exact", min.coverage = FALSE, )

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           75'th %ile = 1.524903
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         75'th %ile
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                89.50169%
  #
  #Confidence Limit Rank(s):        13 20
  #
  #Confidence Interval:             LCL = 1.018038
  #                                 UCL = 5.002399

  #----------

  #-----------------------------------------------------------
  # Now supply our own bounds for the confidence interval.
  # The first example above based on the Interpolate method
  # used lcl.rank=12, ucl.rank=19 and lcl.rank=13, ucl.rank=18
  # and interpolated between these two confidence intervals.
  # Here we will specify lcl.rank=13 and ucl.rank=18.  The
  # resulting confidence level is 81%.
  #-----------------------------------------------------------

  eqnpar(dat, p = 0.75, ci = TRUE, lcl.rank = 13, ucl.rank = 18)

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           75'th %ile = 1.524903
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         75'th %ile
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                80.69277%
  #
  #Confidence Limit Rank(s):        13 18
  #
  #Confidence Interval:             LCL = 1.018038
  #                                 UCL = 2.071172

  #----------

  # Clean up
  rm(dat)

  #==========

  # Modify Example 17-4 on page 17-21 of USEPA (2009).  This example uses
  # copper concentrations (ppb) from 3 background wells to set an upper
  # limit for 2 compliance wells.  Here we will attempt to compute an upper
  # 95% confidence interval for the 95'th percentile of the distribution of
  # copper concentrations in the background wells.
  #
  # The data are stored in EPA.92c.copper2.df.
  #
  # Note that even though these data are Type I left singly censored,
  # it is still possible to compute an estimate of the 95'th percentile.

  EPA.92c.copper2.df
  #   Copper.orig Copper Censored Month Well  Well.type
  #1           <5    5.0     TRUE     1    1 Background
  #2           <5    5.0     TRUE     2    1 Background
  #3          7.5    7.5    FALSE     3    1 Background
  #...
  #9          9.2    9.2    FALSE     1    2 Background
  #10          <5    5.0     TRUE     2    2 Background
  #11          <5    5.0     TRUE     3    2 Background
  #...
  #17          <5    5.0     TRUE     1    3 Background
  #18         5.4    5.4    FALSE     2    3 Background
  #19         6.7    6.7    FALSE     3    3 Background
  #...
  #29         6.2    6.2    FALSE     5    4 Compliance
  #30          <5    5.0     TRUE     6    4 Compliance
  #31         7.8    7.8    FALSE     7    4 Compliance
  #...
  #38          <5    5.0     TRUE     6    5 Compliance
  #39         5.6    5.6    FALSE     7    5 Compliance
  #40          <5    5.0     TRUE     8    5 Compliance

  # Because of the small sample size of n=24 observations, it is not possible
  # to create a nonparametric confidence interval for the 95th percentile
  # that has an associated confidence level of 95%.  If we tried to do this,
  # we would get an error message:

  # with(EPA.92c.copper2.df,
  #    eqnpar(Copper[Well.type=="Background"], p = 0.95, ci = TRUE, lb = 0,
  #      ci.type = "upper", approx.conf.level = 0.95))
  #
  #Error in ci.qnpar.interpolate(x = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,  :
  #  Minimum coverage of 0.95 is not possible with the given sample size.

  # So instead, we will use ci.method="exact" with min.coverage=FALSE
  # to construct the confidence interval.  Note that the associated
  # confidence level is only 71%.

  with(EPA.92c.copper2.df,
     eqnpar(Copper[Well.type=="Background"], p = 0.95, ci = TRUE,
        ci.method = "exact", min.coverage = FALSE,
        ci.type = "upper", lb = 0,
        approx.conf.level = 0.95))

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           95'th %ile = 7.925
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            Copper[Well.type == "Background"]
  #
  #Sample Size:                     24
  #
  #Confidence Interval for:         95'th %ile
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                70.8011%
  #
  #Confidence Limit Rank(s):        NA 24
  #
  #Confidence Interval:             LCL = 0.0
  #                                 UCL = 9.2

  #----------

  # For the above example, the true confidence level is 71% instead of 95%.
  # This is a function of the small sample size.  In fact, as Example 17-4 on
  # pages 17-21 of USEPA (2009) shows, the largest quantile for which you can
  # construct a nonparametric confidence interval that will have associated
  # confidence level of 95% is the 88'th percentile:

  with(EPA.92c.copper2.df,
    eqnpar(Copper[Well.type=="Background"], p = 0.88, ci = TRUE,
      ci.type = "upper", lb = 0, ucl.rank = 24,
      approx.conf.level = 0.95))

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           88'th %ile = 6.892
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            Copper[Well.type == "Background"]
  #
  #Sample Size:                     24
  #
  #Confidence Interval for:         88'th %ile
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95.3486%
  #
  #Confidence Limit Rank(s):        NA 24
  #
  #Confidence Interval:             LCL = 0.0
  #                                 UCL = 9.2

  #==========

  # Reproduce Example 21-6 on pages 21-21 to 21-22 of USEPA (2009).
  # Use 12 measurements of nitrate (mg/L) at a well used for drinking water
  # to determine with 95% confidence whether or not the infant-based, acute
  # risk standard of 10 mg/L has been violated.  Assume that the risk
  # standard represents an upper 95'th percentile limit on nitrate
  # concentrations.  So what we need to do is construct a one-sided
  # lower nonparametric confidence interval for the 95'th percentile
  # that has associated confidence level of no more than 95%, and we will
  # compare the lower confidence limit with the MCL of 10 mg/L.
  #
  # The data for this example are stored in EPA.09.Ex.21.6.nitrate.df.

  # Look at the data:
  #------------------

  EPA.09.Ex.21.6.nitrate.df
  #   Sampling.Date       Date Nitrate.mg.per.l.orig Nitrate.mg.per.l Censored
  #1      7/28/1999 1999-07-28                  <5.0              5.0     TRUE
  #2       9/3/1999 1999-09-03                  12.3             12.3    FALSE
  #3     11/24/1999 1999-11-24                  <5.0              5.0     TRUE
  #4       5/3/2000 2000-05-03                  <5.0              5.0     TRUE
  #5      7/14/2000 2000-07-14                   8.1              8.1    FALSE
  #6     10/31/2000 2000-10-31                  <5.0              5.0     TRUE
  #7     12/14/2000 2000-12-14                    11             11.0    FALSE
  #8      3/27/2001 2001-03-27                  35.1             35.1    FALSE
  #9      6/13/2001 2001-06-13                  <5.0              5.0     TRUE
  #10     9/16/2001 2001-09-16                  <5.0              5.0     TRUE
  #11    11/26/2001 2001-11-26                   9.3              9.3    FALSE
  #12      3/2/2002 2002-03-02                  10.3             10.3    FALSE

  # Determine what order statistic to use for the lower confidence limit
  # in order to achieve no more than 95% confidence.
  #---------------------------------------------------------------------

  conf.levels <- ciNparConfLevel(n = 12, p = 0.95, lcl.rank = 1:12,
    ci.type = "lower")
  names(conf.levels) <- 1:12

  round(conf.levels, 2)
  #   1    2    3    4    5    6    7    8    9   10   11   12
  #1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.98 0.88 0.54

  # Using the 11'th largest observation for the lower confidence limit
  # yields a confidence level of 88%.  Using the 10'th largest
  # observation yields a confidence level of 98%.  The example in
  # USEPA (2009) uses the 10'th largest observation.
  #
  # The 10'th largest observation is 11 mg/L which exceeds the
  # MCL of 10 mg/L, so there is evidence of contamination.
  #--------------------------------------------------------------------

  with(EPA.09.Ex.21.6.nitrate.df,
    eqnpar(Nitrate.mg.per.l, p = 0.95, ci = TRUE,
      ci.type = "lower", lcl.rank = 10))

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            None
  #
  #Estimated Quantile(s):           95'th %ile = 22.56
  #
  #Quantile Estimation Method:      Nonparametric
  #
  #Data:                            Nitrate.mg.per.l
  #
  #Sample Size:                     12
  #
  #Confidence Interval for:         95'th %ile
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        lower
  #
  #Confidence Level:                98.04317%
  #
  #Confidence Limit Rank(s):        10 NA
  #
  #Confidence Interval:             LCL =  11
  #                                 UCL = Inf

  #==========

  # Clean up
  #---------

  rm(conf.levels)

EnvStats documentation built on Aug. 22, 2023, 5:09 p.m.