Recall that to determine confidence interval bounds, we solve the equation \begin{align} F_Y(y_{\rm obs} \vert \theta) - q = 0 \end{align} for $\theta$. Here, $Y$ is our chosen statistic (e.g., $\bar{X}$), $F_Y(\cdot)$ is the cumulative distribution function (or cdf) for $Y$'s sampling distribution, $y_{\rm obs}$ is the observed statistic value, and $q$ is a quantile that we determine given the confidence coefficient $1-\alpha$ and the type of interval we are constructing (one-sided lower bound, one-sided upper bound, or two-sided).
In this vignette, we examine the situation in which the sampling distribution
cdf is not analytically derivable, but where we can simulate individual
data using code in an existing R
package (e.g., rnorm()
).
In this situation, we would create a function that, given a simulated dataset,
returns the statistic value, and we would pass that function into
cdfinv.sim()
.
Let's suppose we sample $n = 6$ data from a $\text{Beta}(a,2.8)$ distribution and we wish to compute a 95% lower bound for $a$, given an observed statistic value.
Given that the beta distribution is a member of the exponential family, it makes sense to determine and apply a sufficient statistic. For a $\text{Beta}(a,2.8)$ distribution, a sufficient statistic, found via likelihood factorization, is \begin{align} Y = \prod_{i=1}^n X_i \,. \end{align} Using products is generally a sub-optimal choice, given the possiblity of floating-point underflows. Since one-to-one functions of sufficient statistics are also sufficient, we adopt $Y = -\sum_{i=1}^n \log X_i$ instead. The minus sign is included simply to make the statistic values positive.
In R
, we code this statistic as follows:
# give the function a name that will not conflict with others in the namespace beta.stat <- function(x) { -sum(log(x)) }
Below we show how we can pass the function beta.stat
,
now defined in R
's global environment,
to cdfinv.sim()
.
The first three arguments are standard: the distribution "name" is
beta
(cdfinv()
will tack a p
on the front), the parameter of interest is
shape1
, and the (assumed) observed statistic value is 10.5.
The next argument specifies that
we wish to compute a (95% by default) lower bound on $a$ (aka shape1
),
instead of the default two-sided interval.
As we know that $a > 0$, we specify a lower parameter bound (lpb
) of 0.
The sample size is nsamp
(here, 6), and stat.func
is the function we
just defined above, beta.stat
.
The last argument is an extra one specifically
required for rbeta()
.
require(cdfinv) cdfinv.sim("beta","shape1",10.5,bound="lower",lpb=0,nsamp=6,stat.func=beta.stat,shape2=2.8)
The 95% lower bound on $a$ is $\hat{\theta}_L = 0.502$. Note that because we
work with empirical sampling distributions, this value is an estimate that under default
conditions takes $\sim$ 1-10 CPU seconds to compute on a typical computer. For greater
precision, one should consider increasing numsim
from its default value of $10^5$ to
$10^6$ or greater, while being mindful of the increased time needed to complete
the computation.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.