Construct a nonparametric prediction interval to contain at least k out of the next m future observations with probability (1-α)100\% for a continuous distribution.

 1 2 3 predIntNpar(x, k = m, m = 1, lpl.rank = ifelse(pi.type == "upper", 0, 1), n.plus.one.minus.upl.rank = ifelse(pi.type == "lower", 0, 1), lb = -Inf, ub = Inf, pi.type = "two-sided")

 x a numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. k positive integer specifying the minimum number of future observations out of m that should be contained in the prediction interval. The default value is k=m. m positive integer specifying the number of future observations. The default value is m=1. lpl.rank positive integer indicating the rank of the order statistic to use for the lower bound of the prediction interval. If pi.type="two-sided" or pi.type="lower", the default value is lpl.rank=1 (implying the minimum value of x is used as the lower bound of the prediction interval). If pi.type="upper", this argument is set equal to 0 and the value of lb is used as the lower bound of the tolerance interval. n.plus.one.minus.upl.rank positive integer related to the rank of the order statistic to use for the upper bound of the prediction interval. A value of n.plus.one.minus.upl.rank=1 (the default when pi.type="two.sided" or pi.type="upper") means use the first largest value, and in general a value of n.plus.one.minus.upl.rank=i means use the i'th largest value. If pi.type="lower", this argument is set equal to 0 and the value of ub is used as the upper bound of the prediction interval. lb, ub scalars indicating lower and upper bounds on the distribution. By default, lb=-Inf and ub=Inf. If you are constructing a prediction interval for a distribution that you know has a lower bound other than -Inf (e.g., 0), set lb to this value. Similarly, if you know the distribution has an upper bound other than Inf, set ub to this value. The argument lb is ignored if pi.type="two-sided" or pi.type="lower". The argument ub is ignored if pi.type="two-sided" or pi.type="upper". pi.type character string indicating what kind of prediction interval to compute. The possible values are "two-sided" (the default), "lower", and "upper".

What is a Nonparametric Prediction Interval?
A nonparametric prediction interval for some population is an interval on the real line constructed so that it will contain at least k of m future observations from that population with some specified probability (1-α)100\%, where 0 < α < 1 and k and m are pre-specified positive integer where k ≤ m. The quantity (1-α)100\% is called the confidence coefficient or confidence level associated with the prediction interval.

The Form of a Nonparametric Prediction Interval
Let \underline{x} = x_1, x_2, …, x_n denote a vector of n independent observations from some continuous distribution, and let x_{(i)} denote the the i'th order statistics in \underline{x}. A two-sided nonparametric prediction interval is constructed as:

[x_{(u)}, x_{(v)}] \;\;\;\;\;\; (1)

where u and v are positive integers between 1 and n, and u < v. That is, u denotes the rank of the lower prediction limit, and v denotes the rank of the upper prediction limit. To make it easier to write some equations later on, we can also write the prediction interval (1) in a slightly different way as:

[x_{(u)}, x_{(n + 1 - w)}] \;\;\;\;\;\; (2)

where

w = n + 1 - v \;\;\;\;\;\; (3)

so that w is a positive integer between 1 and n-1, and u < n+1-w. In terms of the arguments to the function predIntNpar, the argument lpl.rank corresponds to u, and the argument n.plus.one.minus.upl.rank corresponds to w.

If we allow u=0 and w=0 and define lower and upper bounds as:

x_{(0)} = lb \;\;\;\;\;\; (4)

x_{(n+1)} = ub \;\;\;\;\;\; (5)

then Equation (2) above can also represent a one-sided lower or one-sided upper prediction interval as well. That is, a one-sided lower nonparametric prediction interval is constructed as:

[x_{(u)}, x_{(n + 1)}] = [x_{(u)}, ub] \;\;\;\;\;\; (6)

and a one-sided upper nonparametric prediction interval is constructed as:

[x_{(0)}, x_{(n + 1 - w)}] = [lb, x_{(n + 1 - w)}] \;\;\;\;\;\; (7)

Usually, lb = -∞ or lb = 0 and ub = ∞.

Constructing Nonparametric Prediction Intervals for Future Observations
Danziger and Davis (1964) show that the probability that at least k out of the next m observations will fall in the interval defined in Equation (2) is given by:

(1 - α) = [∑_{i=k}^m {{m-i+u+w-1} \choose {m-i}} {{i+n-u-w} \choose i}] / {{n+m} \choose m} \;\;\;\;\;\; (8)

(Note that computing a nonparametric prediction interval for the case k = m = 1 is equivalent to computing a nonparametric β-expectation tolerance interval with coverage (1-α)100\%; see tolIntNpar).

The Special Case of Using the Minimum and the Maximum
Setting u = w = 1 implies using the smallest and largest observed values as the prediction limits. In this case, it can be shown that the probability that at least k out of the next m observations will fall in the interval

[x_{(1)}, x_{(n)}] \;\;\;\;\;\; (9)

is given by:

(1 - α) = [∑_{i=k}^m (m-i-1){{n+i-2} \choose i}] / {{n+m} \choose m} \;\;\;\;\;\; (10)

Setting k=m in Equation (10), the probability that all of the next m observations will fall in the interval defined in Equation (9) is given by:

(1 - α) = \frac{n(n-1)}{(n+m)(n+m-1)} \;\;\;\;\;\; (11)

For one-sided prediction limits, the probability that all m future observations will fall below x_{(n)} (upper prediction limit; pi.type="upper") and the probabilitiy that all m future observations will fall above x_{(1)} (lower prediction limit; pi.type="lower") are both given by:

(1 - α) = \frac{n}{n+m} \;\;\;\;\;\; (12)

Constructing Nonparametric Prediction Intervals for Future Medians
To construct a nonparametric prediction interval for a future median based on s future observations, where s is odd, note that this is equivalent to constructing a nonparametric prediction interval that must hold at least k = (s+1)/2 of the next m = s future observations.

a list of class "estimate" containing the prediction interval and other information. See the help file for estimate.object for details.

Prediction and tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Nelson, 1973; Krishnamoorthy and Mathew, 2009). In the context of environmental statistics, prediction intervals are useful for analyzing data from groundwater detection monitoring programs at hazardous and solid waste facilities (e.g., Gibbons et al., 2009; Millard and Neerchal, 2001; USEPA, 2009).

Steven P. Millard (EnvStats@ProbStatInfo.com)

