# predIntNparSimultaneous: Nonparametric Simultaneous Prediction Interval for a Continuous Distribution

### Description

Construct a nonparametric simultaneous prediction interval for the next r sampling “occasions” based on one of three possible rules: k-of-m, California, or Modified California. The simultaneous prediction interval assumes the observations from from a continuous distribution.

### Usage

 1 2 3 4  predIntNparSimultaneous(x, n.median = 1, k = 1, m = 2, r = 1, rule = "k.of.m", lpl.rank = ifelse(pi.type == "upper", 0, 1), n.plus.one.minus.upl.rank = ifelse(pi.type == "lower", 0, 1), lb = -Inf, ub = Inf, pi.type = "upper", integrate.args.list = NULL) 

### Arguments

 x a numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. n.median positive odd integer specifying the sample size associated with the future medians. The default value is n.median=1 (i.e., individual observations). Note that all future medians must be based on the same sample size. k for the k-of-m rule (rule="k.of.m"), a positive integer specifying the minimum number of observations (or medians) out of m observations (or medians) (all obtained on one future sampling “occassion”) the prediction interval should contain. The default value is k=1. This argument is ignored when the argument rule is not equal to "k.of.m". m positive integer specifying the maximum number of future observations (or medians) on one future sampling “occasion”. The default value is m=2, except when rule="Modified.CA", in which case this argument is ignored and m is automatically set equal to 4. r positive integer specifying the number of future sampling “occasions”. The default value is r=1. rule character string specifying which rule to use. The possible values are "k.of.m" (k-of-m rule; the default), "CA" (California rule), and "Modified.CA" (modified California rule). See the DETAILS section below for more information. lpl.rank positive integer indicating the rank of the order statistic to use for the lower bound of the prediction interval. When pi.type="lower", the default value is lpl.rank=1 (implying the minimum value of x is used as the lower bound of the prediction interval). When pi.type="upper", the argument lpl.rank is set equal to 0 and the value of lb is used as the lower bound of the tolerance interval. n.plus.one.minus.upl.rank positive integer related to the rank of the order statistic to use for the upper bound of the prediction interval. A value of n.plus.one.minus.upl.rank=1 (the default) means use the first largest value, and in general a value of n.plus.one.minus.upl.rank=i means use the i'th largest value. When pi.type="lower", the argument n.plus.one.minus.upl.rank is set equal to 0 and the value of ub is used as the upper bound of the prediction interval. lb, ub scalars indicating lower and upper bounds on the distribution. By default, lb=-Inf and ub=Inf. If you are constructing a prediction interval for a distribution that you know has a lower bound other than -Inf (e.g., 0), set lb to this value. Similarly, if you know the distribution has an upper bound other than Inf, set ub to this value. The argument lb is ignored if pi.type="two-sided" or pi.type="lower". The argument ub is ignored if pi.type="two-sided" or pi.type="upper". pi.type character string indicating what kind of prediction interval to compute. The possible values are "upper" (the default) and "lower". integrate.args.list a list of arguments to supply to the integrate function. The default value is integrate.args.list=NULL which means that the default values of integrate are used.

### Details

What is a Nonparametric Simultaneous Prediction Interval?
A nonparametric prediction interval for some population is an interval on the real line constructed so that it will contain at least k of m future observations from that population with some specified probability (1-α)100\%, where 0 < α < 1 and k and m are some pre-specified positive integers and k ≤ m. The quantity (1-α)100\% is called the confidence coefficient or confidence level associated with the prediction interval. The function predIntNpar computes a standard nonparametric prediction interval.

The function predIntNparSimultaneous computes a nonparametric simultaneous prediction interval that will contain a certain number of future observations with probability (1-α)100\% for each of r future sampling “occasions”, where r is some pre-specified positive integer. The quantity r may refer to r distinct future sampling occasions in time, or it may for example refer to sampling at r distinct locations on one future sampling occasion, assuming that the population standard deviation is the same at all of the r distinct locations.

The function predIntNparSimultaneous computes a nonparametric simultaneous prediction interval based on one of three possible rules:

• For the k-of-m rule (rule="k.of.m"), at least k of the next m future observations will fall in the prediction interval with probability (1-α)100\% on each of the r future sampling occasions. If obserations are being taken sequentially, for a particular sampling occasion, up to m observations may be taken, but once k of the observations fall within the prediction interval, sampling can stop. Note: For this rule, when r=1, the results of predIntNparSimultaneous are equivalent to the results of predIntNpar.

• For the California rule (rule="CA"), with probability (1-α)100\%, for each of the r future sampling occasions, either the first observation will fall in the prediction interval, or else all of the next m-1 observations will fall in the prediction interval. That is, if the first observation falls in the prediction interval then sampling can stop. Otherwise, m-1 more observations must be taken.

• For the Modified California rule (rule="Modified.CA"), with probability (1-α)100\%, for each of the r future sampling occasions, either the first observation will fall in the prediction interval, or else at least 2 out of the next 3 observations will fall in the prediction interval. That is, if the first observation falls in the prediction interval then sampling can stop. Otherwise, up to 3 more observations must be taken.

Nonparametric simultaneous prediction intervals can be extended to using medians in place of single observations (USEPA, 2009, Chapter 19). That is, you can create a nonparametric simultaneous prediction interval that will contain a specified number of medians (based on which rule you choose) on each of r future sampling occassions, where each each median is based on b individual observations. For the function predIntNparSimultaneous, the argument n.median corresponds to b.

The Form of a Nonparametric Prediction Interval
Let \underline{x} = x_1, x_2, …, x_n denote a vector of n independent observations from some continuous distribution, and let x_{(i)} denote the the i'th order statistics in \underline{x}. A two-sided nonparametric prediction interval is constructed as:

[x_{(u)}, x_{(v)}] \;\;\;\;\;\; (1)

where u and v are positive integers between 1 and n, and u < v. That is, u denotes the rank of the lower prediction limit, and v denotes the rank of the upper prediction limit. To make it easier to write some equations later on, we can also write the prediction interval (1) in a slightly different way as:

[x_{(u)}, x_{(n + 1 - w)}] \;\;\;\;\;\; (2)

where

w = n + 1 - v \;\;\;\;\;\; (3)

so that w is a positive integer between 1 and n-1, and u < n+1-w. In terms of the arguments to the function predIntNparSimultaneous, the argument lpl.rank corresponds to u, and the argument n.plus.one.minus.upl.rank corresponds to w.

If we allow u=0 and w=0 and define lower and upper bounds as:

x_{(0)} = lb \;\;\;\;\;\; (4)

x_{(n+1)} = ub \;\;\;\;\;\; (5)

then Equation (2) above can also represent a one-sided lower or one-sided upper prediction interval as well. That is, a one-sided lower nonparametric prediction interval is constructed as:

[x_{(u)}, x_{(n + 1)}] = [x_{(u)}, ub] \;\;\;\;\;\; (6)

and a one-sided upper nonparametric prediction interval is constructed as:

[x_{(0)}, x_{(n + 1 - w)}] = [lb, x_{(n + 1 - w)}] \;\;\;\;\;\; (7)

Usually, lb = -∞ or lb = 0 and ub = ∞.

Note: For nonparametric simultaneous prediction intervals, only lower (pi.type="lower") and upper (pi.type="upper") prediction intervals are available.

Constructing Nonparametric Simultaneous Prediction Intervals for Future Observations
First we will show how to construct a nonparametric simultaneous prediction interval based on future observations (i.e., b=1, n.median=1), and then extend the formulas to future medians.

Simultaneous Prediction Intervals for the k-of-m Rule (rule="k.of.m")
For the k-of-m rule (rule="k.of.m") with w=1 (i.e., n.median=1), at least k of the next m future observations will fall in the prediction interval with probability (1-α)100\% on each of the r future sampling occasions. If observations are being taken sequentially, for a particular sampling occasion, up to m observations may be taken, but once k of the observations fall within the prediction interval, sampling can stop. Note: When r=1, this kind of simultaneous prediction interval becomes the same as a standard nonparametric prediction interval (see predIntNpar).

Chou and Owen (1986) developed the theory for nonparametric simultaneous prediction limits for various rules, including the 1-of-m rule. Their theory, however, does not cover the California or Modified California rules, and uses an r-fold summation involving a minimum of 2^r terms. Davis and McNichols (1994b; 1999) extended the results of Chou and Owen (1986) to include the California and Modified California rule, and developed algorithms that involve summing far fewer terms.

Davis and McNichols (1999) give formulas for the probabilities associated with the one-sided upper simultaneous prediction interval shown in Equation (7). For the k-of-m rule, the probability that at least k of the next m future observations will be contained in the interval given in Equation (7) for each of r future sampling occasions is given by:

 1 - α = E[∑_{i=0}^{m-k} {{k-1+i} \choose {k-1}} Y^k (1-Y)^i]^r = \int_0^1 [∑_{i=0}^{m-k} {{k-1+i} \choose {k-1}} y^k (1-y)^i]^r f(y) dy \;\;\;\;\;\; (8)

where Y denotes a random variable with a beta distribution with parameters v and n+1-v, and f() denotes the pdf of this distribution. Note that v denotes the rank of the order statistic used as the upper prediction limit (i.e., n.plus.one.minus.upl.rank=n+1-v), and that v is usually equal to n.

Also note that the summation term in Equation (8) corresponds to the cumulative distribution function of a Negative Binomial distribution with parameters size=k and prob=y evaluated at q=m-k.

When pi.type="lower", Y denotes a random variable with a beta distribution with parameters n+1-u and u. Note that u denotes the rank of the order statistic used as the lower prediction limit (i.e., lpl.rank=u), and that u is usually equal to 1.

Simultaneous Prediction Intervals for the California Rule (rule="CA")
For the California rule (rule="CA"), with probability (1-α)100\%, for each of the r future sampling occasions, either the first observation will fall in the prediction interval, or else all of the next m-1 observations will fall in the prediction interval. That is, if the first observation falls in the prediction interval then sampling can stop. Otherwise, m-1 more observations must be taken.

In this case, the probability is given by:

 1 - α = E[∑_{i=0}^r {r \choose i} Y^{r - i + (m-1)i} (1-Y)^i] = \int_0^1 [∑_{i=0}^r {r \choose i} y^{r - i + (m-1)i} (1-y)^i] f(y) dy \;\;\;\;\;\; (9)

Simultaneous Prediction Intervals for the Modified California Rule (rule="Modified.CA")
For the Modified California rule (rule="Modified.CA"), with probability (1-α)100\%, for each of the r future sampling occasions, either the first observation will fall in the prediction interval, or else at least 2 out of the next 3 observations will fall in the prediction interval. That is, if the first observation falls in the prediction interval then sampling can stop. Otherwise, up to 3 more observations must be taken.

In this case, the probability is given by:

 1 - α = E[Y^r (1 + Q + Q^2 - 2Q^3)^r] = \int_0^1 [y^r (1 + q + q^2 - 2q^3)^r] f(y) dy \;\;\;\;\;\; (10)

where Q = 1 - Y and q = 1 - y.

Davis and McNichols (1999) provide algorithms for computing the probabilities based on expanding polynomials and the formula for the expected value of a beta random variable. In the discussion section of Davis and McNichols (1999), however, Vangel points out that numerical integration is adequate, and this is how these probabilities are computed in the function predIntNparSimultaneous.

Constructing Nonparametric Simultaneous Prediction Intervals for Future Medians
USEPA (2009, Chapter 19; Cameron, 2011) extends nonparametric simultaneous prediction intervals to testing future medians for the case of the 1-of-1 and 1-of-2 plans for medians of order 3. In general, each of the rules (k-of-m, California, and Modified California) can be easily extended to the case of using medians as long as the medians are based on an odd (as opposed to even) sample size.

For each of the above rules, if we are interested in using medians instead of single observations (i.e., b ≥ 1; n.median≥ 1), and we force b to be odd, then a median will be less than a prediction limit once (b+1)/2 observations are less than the prediction limit. Thus, Equations (8) - (10) are modified by replacing y with the term:

∑_{i=0}^{b - b'} {{b' - 1 + i} \choose {b' - 1}} y^{b'} (1 - y)^i \;\;\;\;\;\; (11)

where

b' = \frac{b+1}{2} \;\;\;\;\;\; (12)

### Value

a list of class "estimate" containing the simultaneous prediction interval and other information. See the help file for estimate.object for details.

### Note

Prediction and tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Nelson, 1973; Krishnamoorthy and Mathew, 2009). In the context of environmental statistics, prediction intervals are useful for analyzing data from groundwater detection monitoring programs at hazardous and solid waste facilities (e.g., Gibbons et al., 2009; Millard and Neerchal, 2001; USEPA, 2009).

### Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

### References

Cameron, Kirk. (2011). Personal communication, February 16, 2011. MacStat Consulting, Ltd., Colorado Springs, Colorado.

Chew, V. (1968). Simultaneous Prediction Intervals. Technometrics 10(2), 323–331.

Danziger, L., and S. Davis. (1964). Tables of Distribution-Free Tolerance Limits. Annals of Mathematical Statistics 35(5), 1361–1365.

Davis, C.B. (1994). Environmental Regulatory Statistics. In Patil, G.P., and C.R. Rao, eds., Handbook of Statistics, Vol. 12: Environmental Statistics. North-Holland, Amsterdam, a division of Elsevier, New York, NY, Chapter 26, 817–865.

Davis, C.B., and R.J. McNichols. (1987). One-sided Intervals for at Least p of m Observations from a Normal Population on Each of r Future Occasions. Technometrics 29, 359–370.

Davis, C.B., and R.J. McNichols. (1994a). Ground Water Monitoring Statistics Update: Part I: Progress Since 1988. Ground Water Monitoring and Remediation 14(4), 148–158.

Davis, C.B., and R.J. McNichols. (1994b). Ground Water Monitoring Statistics Update: Part II: Nonparametric Prediction Limits. Ground Water Monitoring and Remediation 14(4), 159–175.

Davis, C.B., and R.J. McNichols. (1999). Simultaneous Nonparametric Prediction Limits (with Discusson). Technometrics 41(2), 89–112.

Gibbons, R.D. (1987a). Statistical Prediction Intervals for the Evaluation of Ground-Water Quality. Ground Water 25, 455–465.

Gibbons, R.D. (1991b). Statistical Tolerance Limits for Ground-Water Monitoring. Ground Water 29, 563–570.

Gibbons, R.D., and J. Baker. (1991). The Properties of Various Statistical Prediction Intervals for Ground-Water Detection Monitoring. Journal of Environmental Science and Health A26(4), 535–553.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York, 392pp.

Hahn, G., and W. Nelson. (1973). A Survey of Prediction Intervals and Their Applications. Journal of Quality Technology 5, 178–188.

Hall, I.J., R.R. Prairie, and C.K. Motlagh. (1975). Non-Parametric Prediction Intervals. Journal of Quality Technology 7(3), 109–114.

Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

predIntNparSimultaneousConfLevel, predIntNparSimultaneousN,
plotPredIntNparSimultaneousDesign, predIntNparSimultaneousTestPower, predIntNpar, tolIntNpar, estimate.object.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345  # Generate 20 observations from a lognormal mixture distribution with # parameters mean1=1, cv1=0.5, mean2=5, cv2=1, and p.mix=0.1. Use # predIntNparSimultaneous to construct an upper one-sided prediction interval # using the maximum observed value using the 1-of-3 rule. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(250) dat <- rlnormMixAlt(n = 20, mean1 = 1, cv1 = 0.5, mean2 = 5, cv2 = 1, p.mix = 0.1) predIntNparSimultaneous(dat, k = 1, m = 3, lb = 0) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: dat # #Sample Size: 20 # #Prediction Interval Method: exact # #Prediction Interval Type: upper # #Confidence Level: 99.94353% # #Prediction Limit Rank(s): 20 # #Minimum Number of #Future Observations #Interval Should Contain: 1 # #Total Number of #Future Observations: 3 # #Prediction Interval: LPL = 0.000000 # UPL = 1.817311 #---------- # Compare the confidence levels for the 1-of-3 rule, California Rule, and # Modified California Rule. predIntNparSimultaneous(dat, k = 1, m = 3, lb = 0)$interval$conf.level #[1] 0.9994353 predIntNparSimultaneous(dat, m = 3, rule = "CA", lb = 0)$interval$conf.level #[1] 0.9919066 predIntNparSimultaneous(dat, rule = "Modified.CA", lb = 0)$interval$conf.level #[1] 0.9984943 #========= # Repeat the above example, but create the baseline data using just # n=8 observations and set r to 4 future sampling occasions set.seed(598) dat <- rlnormMixAlt(n = 8, mean1 = 1, cv1 = 0.5, mean2 = 5, cv2 = 1, p.mix = 0.1) predIntNparSimultaneous(dat, k = 1, m = 3, r = 4, lb = 0) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: dat # #Sample Size: 8 # #Prediction Interval Method: exact # #Prediction Interval Type: upper # #Confidence Level: 97.7599% # #Prediction Limit Rank(s): 8 # #Minimum Number of #Future Observations #Interval Should Contain #(per Sampling Occasion): 1 # #Total Number of #Future Observations #(per Sampling Occasion): 3 # #Number of Future #Sampling Occasions: 4 # #Prediction Interval: LPL = 0.000000 # UPL = 5.683453 #---------- # Compare the confidence levels for the 1-of-3 rule, California Rule, and # Modified California Rule. predIntNparSimultaneous(dat, k = 1, m = 3, r = 4, lb = 0)$interval$conf.level #[1] 0.977599 predIntNparSimultaneous(dat, m = 3, r = 4, rule = "CA", lb = 0)$interval$conf.level #[1] 0.8737798 predIntNparSimultaneous(dat, r = 4, rule = "Modified.CA", lb = 0)$interval$conf.level #[1] 0.9510178 #========== # Example 19-5 of USEPA (2009, p. 19-33) shows how to compute nonparametric upper # simultaneous prediction limits for various rules based on trace mercury data (ppb) # collected in the past year from a site with four background wells and 10 compliance # wells (data for two of the compliance wells are shown in the guidance document). # The facility must monitor the 10 compliance wells for five constituents # (including mercury) annually. # Here we will compute the confidence level associated with two different sampling plans: # 1) the 1-of-2 retesting plan for a median of order 3 using the background maximum and # 2) the 1-of-4 plan on individual observations using the 3rd highest background value. # The data for this example are stored in EPA.09.Ex.19.5.mercury.df. # We will pool data from 4 background wells that were sampled on # a number of different occasions, giving us a sample size of # n = 20 to use to construct the prediction limit. # There are 10 compliance wells and we will monitor 5 different # constituents at each well annually. For this example, USEPA (2009) # recommends setting r to the product of the number of compliance wells and # the number of evaluations per year. # To determine the minimum confidence level we require for # the simultaneous prediction interval, USEPA (2009) recommends # setting the maximum allowed individual Type I Error level per constituent to: # 1 - (1 - SWFPR)^(1 / Number of Constituents) # which translates to setting the confidence limit to # (1 - SWFPR)^(1 / Number of Constituents) # where SWFPR = site-wide false positive rate. For this example, we # will set SWFPR = 0.1. Thus, the required individual Type I Error level # and confidence level per constituent are given as follows: # n = 20 based on 4 Background Wells # nw = 10 Compliance Wells # nc = 5 Constituents # ne = 1 Evaluation per year n <- 20 nw <- 10 nc <- 5 ne <- 1 # Set number of future sampling occasions r to # Number Compliance Wells x Number Evaluations per Year r <- nw * ne conf.level <- (1 - 0.1)^(1 / nc) conf.level #[1] 0.9791484 alpha <- 1 - conf.level alpha #[1] 0.02085164 #---------- # Look at the data: head(EPA.09.Ex.19.5.mercury.df) # Event Well Well.type Mercury.ppb.orig Mercury.ppb Censored #1 1 BG-1 Background 0.21 0.21 FALSE #2 2 BG-1 Background <.2 0.20 TRUE #3 3 BG-1 Background <.2 0.20 TRUE #4 4 BG-1 Background <.2 0.20 TRUE #5 5 BG-1 Background <.2 0.20 TRUE #6 6 BG-1 Background NA FALSE longToWide(EPA.09.Ex.19.5.mercury.df, "Mercury.ppb.orig", "Event", "Well", paste.row.name = TRUE) # BG-1 BG-2 BG-3 BG-4 CW-1 CW-2 #Event.1 0.21 <.2 <.2 <.2 0.22 0.36 #Event.2 <.2 <.2 0.23 0.25 0.2 0.41 #Event.3 <.2 <.2 <.2 0.28 <.2 0.28 #Event.4 <.2 0.21 0.23 <.2 0.25 0.45 #Event.5 <.2 <.2 0.24 <.2 0.24 0.43 #Event.6 <.2 0.54 # Construct the upper simultaneous prediction limit using the 1-of-2 # retesting plan for a median of order 3 based on the background maximum Hg.Back <- with(EPA.09.Ex.19.5.mercury.df, Mercury.ppb[Well.type == "Background"]) pred.int.1.of.2.med.3 <- predIntNparSimultaneous(Hg.Back, n.median = 3, k = 1, m = 2, r = r, lb = 0) pred.int.1.of.2.med.3 #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: Hg.Back # #Sample Size: 20 # #Number NA/NaN/Inf's: 4 # #Prediction Interval Method: exact # #Prediction Interval Type: upper # #Confidence Level: 99.40354% # #Prediction Limit Rank(s): 20 # #Minimum Number of #Future Medians #Interval Should Contain #(per Sampling Occasion): 1 # #Total Number of #Future Medians #(per Sampling Occasion): 2 # #Number of Future #Sampling Occasions: 10 # #Sample Size for Medians: 3 # #Prediction Interval: LPL = 0.00 # UPL = 0.28 # Note that the achieved confidence level of 99.4% is greater than the # required confidence level of 97.9%. # Now determine whether either compliance well indicates evidence of # Mercury contamination. # Compliance Well 1 #------------------ Hg.CW.1 <- with(EPA.09.Ex.19.5.mercury.df, Mercury.ppb.orig[Well == "CW-1"]) Hg.CW.1 #[1] "0.22" "0.2" "<.2" "0.25" "0.24" "<.2" # The median of the first 3 observations is 0.2, which is less than # the UPL of 0.28, so there is no evidence of contamination. # Compliance Well 2 #------------------ Hg.CW.2 <- with(EPA.09.Ex.19.5.mercury.df, Mercury.ppb.orig[Well == "CW-2"]) Hg.CW.2 #[1] "0.36" "0.41" "0.28" "0.45" "0.43" "0.54" # The median of the first 3 observations is 0.36, so 3 more observations have to # be looked at. The median of the second 3 observations is 0.45, which is # larger than the UPL of 0.28, so there is evidence of contamination. #---------- # Now create the upper simultaneous prediction limit using the 1-of-4 plan # on individual observations using the 3rd highest background value. pred.int.1.of.4.3rd <- predIntNparSimultaneous(Hg.Back, k = 1, m = 4, r = r, lb = 0, n.plus.one.minus.upl.rank = 3) pred.int.1.of.4.3rd #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: Hg.Back # #Sample Size: 20 # #Number NA/NaN/Inf's: 4 # #Prediction Interval Method: exact # #Prediction Interval Type: upper # #Confidence Level: 98.64909% # #Prediction Limit Rank(s): 18 # #Minimum Number of #Future Observations #Interval Should Contain #(per Sampling Occasion): 1 # #Total Number of #Future Observations #(per Sampling Occasion): 4 # #Number of Future #Sampling Occasions: 10 # #Prediction Interval: LPL = 0.00 # UPL = 0.24 # Note that the achieved confidence level of 98.6% is greater than the # required confidence level of 97.9%. # Now determine whether either compliance well indicates evidence of # Mercury contamination. # Compliance Well 1 #------------------ Hg.CW.1 <- with(EPA.09.Ex.19.5.mercury.df, Mercury.ppb.orig[Well == "CW-1"]) Hg.CW.1 #[1] "0.22" "0.2" "<.2" "0.25" "0.24" "<.2" # The first observation is less than the UPL of 0.24, which is less than # the UPL of 0.28, so there is no evidence of contamination. # Compliance Well 2 #------------------ Hg.CW.2 <- with(EPA.09.Ex.19.5.mercury.df, Mercury.ppb.orig[Well == "CW-2"]) Hg.CW.2 #[1] "0.36" "0.41" "0.28" "0.45" "0.43" "0.54" # All of the first 4 observations are greater than the UPL of 0.24, so there # is evidence of contamination. #========== # Cleanup #-------- rm(dat, n, nw, nc, ne, r, conf.level, alpha, Hg.Back, pred.int.1.of.2.med.3, pred.int.1.of.4.3rd, Hg.CW.1, Hg.CW.2)