Compute the probability that at least one set of future observations violates the given rule based on a nonparametric simultaneous prediction interval for the next r future sampling occasions. The three possible rules are: kofm, California, or Modified California. The probability is based on assuming the true distribution of the observations is normal.
1 2 3 4 5 6  predIntNparSimultaneousTestPower(n, n.median = 1, k = 1, m = 2, r = 1,
rule = "k.of.m", lpl.rank = ifelse(pi.type == "upper", 0, 1),
n.plus.one.minus.upl.rank = ifelse(pi.type == "lower", 0, 1),
delta.over.sigma = 0, pi.type = "upper", r.shifted = r,
method = "approx", NMC = 100, ci = FALSE, ci.conf.level = 0.95,
integrate.args.list = NULL)

n 
vector of positive integers specifying the sample sizes.
Missing ( 
n.median 
vector of positive odd integers specifying the sample size associated with the
future medians. The default value is 
k 
for the kofm rule ( 
m 
vector of positive integers specifying the maximum number of future observations (or
medians) on one future sampling “occasion”.
The default value is 
r 
vector of positive integers specifying the number of future sampling
“occasions”. The default value is 
rule 
character string specifying which rule to use. The possible values are

lpl.rank 
vector of nonnegative integers indicating the rank of the order statistic to use for
the lower bound of the prediction interval. When 
n.plus.one.minus.upl.rank 
vector of nonnegative integers related to the rank of the order statistic to use for
the upper bound of the prediction interval. A value of 
delta.over.sigma 
numeric vector indicating the ratio Δ/σ. The quantity
Δ (delta) denotes the difference between the mean of the population
that was sampled to construct the prediction interval, and the mean of the
population that will be sampled to produce the future observations. The quantity
σ (sigma) denotes the population standard deviation for both populations.
The default value is 
pi.type 
character string indicating what kind of prediction interval to compute.
The possible values are 
r.shifted 
vector of positive integers specifying the number of future sampling occasions for
which the scaled mean is shifted by Δ/σ. All values must be
integeters between 
method 
character string indicating what method to use to compute the power. The possible
values are 
NMC 
positive integer indicating the number of Monte Carlo trials to run when 
ci 
logical scalar indicating whether to compute a confidence interval for the power
when 
ci.conf.level 
numeric scalar between 0 and 1 indicating the confidence level associated with the
confidence interval for the power. The argument is ignored if 
integrate.args.list 
list of arguments to supply to the 
What is a Nonparametric Simultaneous Prediction Interval?
A nonparametric prediction interval for some population is an interval on the real line
constructed so that it will contain at least k of m future observations from
that population with some specified probability (1α)100\%, where
0 < α < 1 and k and m are some prespecified positive integers
and k ≤ m. The quantity (1α)100\% is called
the confidence coefficient or confidence level associated with the prediction
interval. The function predIntNpar
computes a standard
nonparametric prediction interval.
The function predIntNparSimultaneous
computes a nonparametric simultaneous
prediction interval that will contain a certain number of future observations
with probability (1α)100\% for each of r future sampling
“occasions”,
where r is some prespecified positive integer. The quantity r may
refer to r distinct future sampling occasions in time, or it may for example
refer to sampling at r distinct locations on one future sampling occasion,
assuming that the population standard deviation is the same at all of the r
distinct locations.
The function predIntNparSimultaneous
computes a nonparametric simultaneous
prediction interval based on one of three possible rules:
For the kofm rule (rule="k.of.m"
), at least k of
the next m future observations will fall in the prediction
interval with probability (1α)100\% on each of the r future
sampling occasions. If obserations are being taken sequentially, for a particular
sampling occasion, up to m observations may be taken, but once
k of the observations fall within the prediction interval, sampling can stop.
Note: For this rule, when r=1, the results of predIntNparSimultaneous
are equivalent to the results of predIntNpar
.
For the California rule (rule="CA"
), with probability
(1α)100\%, for each of the r future sampling occasions, either
the first observation will fall in the prediction interval, or else all of the next
m1 observations will fall in the prediction interval. That is, if the first
observation falls in the prediction interval then sampling can stop. Otherwise,
m1 more observations must be taken.
For the Modified California rule (rule="Modified.CA"
), with probability
(1α)100\%, for each of the r future sampling occasions, either the
first observation will fall in the prediction interval, or else at least 2 out of
the next 3 observations will fall in the prediction interval. That is, if the first
observation falls in the prediction interval then sampling can stop. Otherwise, up
to 3 more observations must be taken.
Nonparametric simultaneous prediction intervals can be extended to using medians
in place of single observations (USEPA, 2009, Chapter 19). That is, you can
create a nonparametric simultaneous prediction interval that will contain a
specified number of medians (based on which rule you choose) on each of r
future sampling occassions, where each each median is based on b individual
observations. For the function predIntNparSimultaneous
, the argument
n.median
corresponds to b.
The Form of a Nonparametric Prediction Interval
Let \underline{x} = x_1, x_2, …, x_n denote a vector of n
independent observations from some continuous distribution, and let
x_{(i)} denote the the i'th order statistics in \underline{x}.
A twosided nonparametric prediction interval is constructed as:
[x_{(u)}, x_{(v)}] \;\;\;\;\;\; (1)
where u and v are positive integers between 1 and n, and u < v. That is, u denotes the rank of the lower prediction limit, and v denotes the rank of the upper prediction limit. To make it easier to write some equations later on, we can also write the prediction interval (1) in a slightly different way as:
[x_{(u)}, x_{(n + 1  w)}] \;\;\;\;\;\; (2)
where
w = n + 1  v \;\;\;\;\;\; (3)
so that w is a positive integer between 1 and n1, and
u < n+1w. In terms of the arguments to the function predIntNparSimultaneous
,
the argument lpl.rank
corresponds to u, and the argument
n.plus.one.minus.upl.rank
corresponds to w.
If we allow u=0 and w=0 and define lower and upper bounds as:
x_{(0)} = lb \;\;\;\;\;\; (4)
x_{(n+1)} = ub \;\;\;\;\;\; (5)
then Equation (2) above can also represent a onesided lower or onesided upper prediction interval as well. That is, a onesided lower nonparametric prediction interval is constructed as:
[x_{(u)}, x_{(n + 1)}] = [x_{(u)}, ub] \;\;\;\;\;\; (6)
and a onesided upper nonparametric prediction interval is constructed as:
[x_{(0)}, x_{(n + 1  w)}] = [lb, x_{(n + 1  w)}] \;\;\;\;\;\; (7)
Usually, lb = ∞ or lb = 0 and ub = ∞.
Note: For nonparametric simultaneous prediction intervals, only lower
(pi.type="lower"
) and upper (pi.type="upper"
) prediction
intervals are available.
Computing Power
The "power" of the prediction interval is defined as the probability that
at least one set of future observations violates the given rule based on a
simultaneous prediction interval for the next r future sampling occasions,
where the population for the future observations is allowed to differ from
the population for the observations used to construct the prediction interval.
For the function predIntNparSimultaneousTestPower
, power is computed assuming
both the background and future the observations come from normal distributions with
the same standard deviation, but the means of the distributions are allowed to differ.
The quantity Δ (upper case delta) denotes the difference between the
mean of the population that was sampled to construct the prediction interval, and
the mean of the population that will be sampled to produce the future observations.
The quantity σ (sigma) denotes the population standard deviation of both
of these populations. The argument delta.over.sigma
corresponds to the
quantity Δ/σ.
Approximate Power (method="approx"
)
Based on Gansecki (2009), the power of a nonparametric simultaneous prediction
interval when the underlying observations come from a nomral distribution
can be approximated by the power of a normal simultaneous prediction
interval (see predIntNormSimultaneousTestPower
) where the multiplier
K is replaced with the expected value of the normal order statistic that
corresponds to the rank of the order statistic used for the upper or lower bound
of the prediction interval. Gansecki (2009) uses the approximation:
K = Φ^{1}(\frac{i  0.5}{n}) \;\;\;\;\;\; (8)
where Φ denotes the cumulative distribution function of the standard
normal distribution and i denotes the rank of the order statistic used
as the prediction limit. The function
predIntNparSimultaneousTestPower
uses the exact value of the expected value of the normal order statistic by
calling the function evNormOrdStatsScalar
.
Power Based on Monte Carlo Simulation (method="simulate"
)
When method="simulate"
, the power of the nonparametric simultaneous
prediction interval is estimated based on a Monte Carlo simulation. The argument
NMC
determines the number of Monte Carlo trials. If ci=TRUE
, a
confidence interval for the power is created based on the NMC
Monte Carlo
estimates of power.
vector of values between 0 and 1 equal to the probability that the rule will be violated.
See the help file for predIntNparSimultaneous
.
In the course of designing a sampling program, an environmental scientist may wish
to determine the relationship between sample size, significance level, power, and
scaled difference if one of the objectives of the sampling program is to determine
whether two distributions differ from each other. The functions
predIntNparSimultaneousTestPower
and
plotPredIntNparSimultaneousTestPowerCurve
can be
used to investigate these relationships for the case of normallydistributed
observations.
Steven P. Millard (EnvStats@ProbStatInfo.com)
See the help file for predIntNparSimultaneous
.
Gansecki, M. (2009). Using the Optimal Rank Values Calculator. US Environmental Protection Agency, Region 8, March 10, 2009.
plotPredIntNparSimultaneousTestPowerCurve
,
predIntNparSimultaneous
,
predIntNparSimultaneousN
,
predIntNparSimultaneousConfLevel
,
plotPredIntNparSimultaneousDesign
,
predIntNpar
, tolIntNpar
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123  # Example 195 of USEPA (2009, p. 1933) shows how to compute nonparametric upper
# simultaneous prediction limits for various rules based on trace mercury data (ppb)
# collected in the past year from a site with four background wells and 10 compliance
# wells (data for two of the compliance wells are shown in the guidance document).
# The facility must monitor the 10 compliance wells for five constituents
# (including mercury) annually.
# Here we will compute the confidence levels and powers associated with
# two different sampling plans:
# 1) the 1of2 retesting plan for a median of order 3 using the
# background maximum and
# 2) the 1of4 plan on individual observations using the 3rd highest
# background value.
# Power will be computed assuming a normal distribution and setting
# delta.over.sigma equal to 2, 3, and 4.
# The data for this example are stored in EPA.09.Ex.19.5.mercury.df.
# We will pool data from 4 background wells that were sampled on
# a number of different occasions, giving us a sample size of
# n = 20 to use to construct the prediction limit.
# There are 10 compliance wells and we will monitor 5 different
# constituents at each well annually. For this example, USEPA (2009)
# recommends setting r to the product of the number of compliance wells and
# the number of evaluations per year.
# To determine the minimum confidence level we require for
# the simultaneous prediction interval, USEPA (2009) recommends
# setting the maximum allowed individual Type I Error level per constituent to:
# 1  (1  SWFPR)^(1 / Number of Constituents)
# which translates to setting the confidence limit to
# (1  SWFPR)^(1 / Number of Constituents)
# where SWFPR = sitewide false positive rate. For this example, we
# will set SWFPR = 0.1. Thus, the required individual Type I Error level
# and confidence level per constituent are given as follows:
# n = 20 based on 4 Background Wells
# nw = 10 Compliance Wells
# nc = 5 Constituents
# ne = 1 Evaluation per year
n < 20
nw < 10
nc < 5
ne < 1
# Set number of future sampling occasions r to
# Number Compliance Wells x Number Evaluations per Year
r < nw * ne
conf.level < (1  0.1)^(1 / nc)
conf.level
#[1] 0.9791484
# So the required confidence level is 0.98, or 98%.
# Now determine the confidence level associated with each plan.
# Note that both plans achieve the required confidence level.
# 1) the 1of2 retesting plan for a median of order 3 using the
# background maximum
predIntNparSimultaneousConfLevel(n = 20, n.median = 3, k = 1, m = 2, r = r)
#[1] 0.9940354
# 2) the 1of4 plan based on individual observations using the 3rd highest
# background value.
predIntNparSimultaneousConfLevel(n = 20, k = 1, m = 4, r = r,
n.plus.one.minus.upl.rank = 3)
#[1] 0.9864909
#
# Compute approximate power of each plan to detect contamination at just 1 well
# assuming true underying distribution of Hg is Normal at all wells and
# using delta.over.sigma equal to 2, 3, and 4.
#
# Computer aproximate power for
# 1) the 1of2 retesting plan for a median of order 3 using the
# background maximum
predIntNparSimultaneousTestPower(n = 20, n.median = 3, k = 1, m = 2, r = r,
delta.over.sigma = 2:4, r.shifted = 1)
#[1] 0.3953712 0.9129671 0.9983054
# Compute approximate power for
# 2) the 1of4 plan based on individual observations using the 3rd highest
# background value.
predIntNparSimultaneousTestPower(n = 20, k = 1, m = 4, r = r,
n.plus.one.minus.upl.rank = 3, delta.over.sigma = 2:4, r.shifted = 1)
#[1] 0.4367972 0.8694664 0.9888779
#
## Not run:
# Compare estimated power using approximation method with estimated power
# using Monte Carlo simulation for the 1of4 plan based on individual
# observations using the 3rd highest background value.
predIntNparSimultaneousTestPower(n = 20, k = 1, m = 4, r = r,
n.plus.one.minus.upl.rank = 3, delta.over.sigma = 2:4, r.shifted = 1,
method = "simulate", ci = TRUE, NMC = 1000)
#[1] 0.437 0.863 0.989
#attr(,"conf.int")
# [,1] [,2] [,3]
#LCL 0.4111999 0.8451148 0.9835747
#UCL 0.4628001 0.8808852 0.9944253
## End(Not run)
#==========
# Cleanup
#
rm(n, nw, nc, ne, r, conf.level)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.