bounder_cdf | R Documentation |
Estimate the bounds of the distribution a sample came from using the CDF of
the order statistics of the sample. Use with the bounder
argument to density_bounded()
.
Supports automatic partial function application.
bounder_cdf(x, p = 0.01)
x |
numeric vector containing a sample to estimate the bounds of. |
p |
scalar in |
bounder_cdf()
uses the distribution of the order statistics of
X
to estimate where the first and last order statistics (i.e. the
min and max) of this distribution would be, assuming the sample x
is the
distribution. Then, it adjusts the boundary outwards from min(x)
(or max(x)
)
by the distance between min(x)
(or max(x)
) and the nearest estimated
order statistic.
Taking X
= x
, the distributions of the first and last order statistics are:
\begin{array}{rcl}
F_{X_{(1)}}(x) &=& 1 - \left[1 - F_X(x)\right]^n\\
F_{X_{(n)}}(x) &=& F_X(x)^n
\end{array}
Re-arranging, we can get the inverse CDFs (quantile functions) of each
order statistic in terms of the quantile function of X
(which we
can estimate from the data), giving us an estimate for the minimum
and maximum order statistic:
\begin{array}{rcrcl}
\hat{x_1} &=& F_{X_{(1)}}^{-1}(p) &=& F_X^{-1}\left[1 - (1 - p)^{1/n}\right]\\
\hat{x_n} &=& F_{X_{(n)}}^{-1}(p) &=& F_X^{-1}\left[p^{1/n}\right]
\end{array}
Then the estimated bounds are:
\left[2\min(x) - \hat{x_1}, 2\max(x) - \hat{x_n} \right]
These bounds depend on p
, the percentile of the distribution of the order
statistic used to form the estimate. While p = 0.5
(the median) might be
a reasonable choice (and gives results similar to bounder_cooke()
), this tends
to be a bit too aggressive in "detecting" bounded distributions, especially in
small sample sizes. Thus, we use a default of p = 0.01
, which tends to
be very conservative in small samples (in that it usually gives results
roughly equivalent to an unbounded distribution), but which still performs
well on bounded distributions when sample sizes are larger (in the thousands).
A length-2 numeric vector giving an estimate of the minimum and maximum bounds
of the distribution that x
came from.
The bounder
argument to density_bounded()
.
Other bounds estimators:
bounder_cooke()
,
bounder_range()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.