Recall that to determine confidence interval bounds, we solve the equation \begin{align} F_Y(y_{\rm obs} \vert \theta) - q = 0 \end{align} for $\theta$. Here, $Y$ is our chosen statistic (e.g., $\bar{X}$), $F_Y(\cdot)$ is the cumulative distribution function (or cdf) for $Y$'s sampling distribution, $y_{\rm obs}$ is the observed statistic value, and $q$ is a quantile that we determine given the confidence coefficient $1-\alpha$ and the type of interval we are constructing (one-sided lower bound, one-sided upper bound, or two-sided).
In this vignette, we examine the situation in which the sampling distribution
cdf is analytically derivable, but (a) is not (easily) invertible by hand,
and (b) is (to our knowledge!) not coded in an existing R
package. In this
situation, we would develop our own cdf-calculating function, and then pass
it to cdfinv()
.
Let's suppose we sample $n = 16$ data from a $\text{Uniform}(0,\theta)$ distribution and we wish to compute a 95% upper bound for $\theta$. Further, let's suppose that the $10^{\rm th}$ datum is $x_{(10),\rm obs} = 0.45$, and that we will use this datum to determine the confidence interval. (This is obviously an academic exercise: we would normally utilize $x_{(16),\rm obs}$ and solve for the upper bound by hand.)
For the $\text{Uniform}(0,\theta)$ distribution, the cdf within the domain is \begin{align} F_X(x) = \int_0^x \frac{1}{\theta - 0} dy = \frac{x}{\theta} \,. \end{align} We utilize a result found in Casella & Berger (2002; page 229): the cdf for the $j^{\rm th}$ order statistic is \begin{align} F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} [F_X(x)]^k [1-F_X(x)]^{n-k} \,. \end{align} (Our notation slightly differs from that of Casella & Berger.) Here, that cdf would be \begin{align} F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} \left(\frac{x}{\theta}\right)^k \left[1-\left(\frac{x}{\theta}\right)\right]^{n-k} \,. \end{align} This is obviously not a cdf that we can easily invert by hand when $j < n$.
Below we show how one might create one's own code that returns the cdf value
as a function of a coordinate $x$ and a parameter $\theta$. Note that the first
argument must be q
...the observed statistic value passed to cdfinv()
will be
matched to this argument name.
# give the function a name that will not conflict with others in the namespace punif.ord <- function(q,theta,j,n) { sum(choose(n,j:n)*(q/theta)^(j:n)*(1-q/theta)^(n-j:n)) }
Below we show how we can pass the function, now defined in R
's global environment,
to cdfinv()
. The first three arguments are standard: the distribution "name" is
unif.ord
(cdfinv()
will tack a p
on the front), the parameter of interest is
theta
, and the observed statistic value is 0.45. The next argument specifies that
we wish to compute a (95% by default) upper bound on $\theta$, instead of the default
two-sided interval. As we know that $\theta \geq x_{(10),\rm obs}$, we specify a
lower parameter bound (lpb
) of 0.45. The last two arguments are extra ones specifically
required for punif.ord()
.
require(cdfinv) cdfinv("unif.ord","theta",0.45,bound="upper",lpb=0.45,j=10,n=16)
The 95% upper bound on $\theta$ determined using the $10^{\rm th}$ ordered datum is $\hat{\theta}_U = 1.151$.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.