bounder_cdf: Estimate bounds of a distribution using the CDF of its order...
In ggdist: Visualizations of Distributions and Uncertainty

bounder_cdf

R Documentation

Estimate bounds of a distribution using the CDF of its order statistics

Description

Estimate the bounds of the distribution a sample came from using the CDF of the order statistics of the sample. Use with the bounder argument to density_bounded().

Supports automatic partial function application with waived arguments.

Usage

bounder_cdf(x, p = 0.01)

Arguments

`x`	<numeric> Sample to estimate the bounds of.
`p`	<scalar numeric> in `[0,1]`: Percentile of the order statistic distribution to use as the estimate. `p = 1` will return `range(x)`; `p = 0.5` will give the median estimate, `p = 0` will give a very wide estimate (effectively treating the distribution as unbounded when used with `density_bounded()`).

Details

bounder_cdf() uses the distribution of the order statistics of X to estimate where the first and last order statistics (i.e. the min and max) of this distribution would be, assuming the sample x is the distribution. Then, it adjusts the boundary outwards from min(x) (or max(x)) by the distance between min(x) (or max(x)) and the nearest estimated order statistic.

Taking X = x, the distributions of the first and last order statistics are:

\begin{array}{rcl} F_{X_{(1)}}(x) &=& 1 - \left[1 - F_X(x)\right]^n\\ F_{X_{(n)}}(x) &=& F_X(x)^n \end{array}

Re-arranging, we can get the inverse CDFs (quantile functions) of each order statistic in terms of the quantile function of X (which we can estimate from the data), giving us an estimate for the minimum and maximum order statistic:

\begin{array}{rcrcl} \hat{x_1} &=& F_{X_{(1)}}^{-1}(p) &=& F_X^{-1}\left[1 - (1 - p)^{1/n}\right]\\ \hat{x_n} &=& F_{X_{(n)}}^{-1}(p) &=& F_X^{-1}\left[p^{1/n}\right] \end{array}

Then the estimated bounds are:

\left[2\min(x) - \hat{x_1}, 2\max(x) - \hat{x_n} \right]

These bounds depend on p, the percentile of the distribution of the order statistic used to form the estimate. While p = 0.5 (the median) might be a reasonable choice (and gives results similar to bounder_cooke()), this tends to be a bit too aggressive in "detecting" bounded distributions, especially in small sample sizes. Thus, we use a default of p = 0.01, which tends to be very conservative in small samples (in that it usually gives results roughly equivalent to an unbounded distribution), but which still performs well on bounded distributions when sample sizes are larger (in the thousands).

Value

A length-2 numeric vector giving an estimate of the minimum and maximum bounds of the distribution that x came from.

ggdist
Visualizations of Distributions and Uncertainty

bounder_cdf: Estimate bounds of a distribution using the CDF of its order...
In ggdist: Visualizations of Distributions and Uncertainty

Estimate bounds of a distribution using the CDF of its order statistics

Description

Usage

Arguments

Details

Value

See Also

Related to bounder_cdf in ggdist...

R Package Documentation

Browse R Packages

We want your feedback!

ggdist Visualizations of Distributions and Uncertainty

bounder_cdf: Estimate bounds of a distribution using the CDF of its order... In ggdist: Visualizations of Distributions and Uncertainty

Estimate bounds of a distribution using the CDF of its order statistics

Description

Usage

Arguments

Details

Value

See Also

Related to bounder_cdf in ggdist...

R Package Documentation

Browse R Packages

We want your feedback!

ggdist
Visualizations of Distributions and Uncertainty

bounder_cdf: Estimate bounds of a distribution using the CDF of its order...
In ggdist: Visualizations of Distributions and Uncertainty