In vincenzocoia/distionary: Create and Evaluate Probability Distributions

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(distionary)

This vignette covers the second goal of distionary: to evaluate probability distributions, even when that property is not specified in the distribution's definition.

Distributional Representations

A distributional representation is a function that fully describes the distribution, such that any property can be calculated from it. Here is a list of representations recognised by distionary, and the functions for accessing them.

| Representation | distionary Functions | |----------------------------------|-----------------------------------------| | Cumulative Distribution Function | eval_cdf(), enframe_cdf() | | Survival Function | eval_survival(), enframe_survival() | | Quantile Function | eval_quantile(), enframe_quantile() | | Hazard Function | eval_hazard(), enframe_hazard() | | Cumulative Hazard Function | eval_chf(), enframe_chf() | | Probability density Function | eval_density(), enframe_density() | | Probability mass Function (PMF) | eval_pmf(), enframe_pmf() | | Odds Function | eval_odds(), enframe_odds() | | Return Level Function | eval_return(), enframe_return() |

All representations can either be accessed by the eval_*() family of functions, providing a vector of the evaluated representation.

d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)

Alternatively, the enframe_*() family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow.

enframe_pmf(d1, at = 0:5)

The enframe_*() functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways:

The input column .arg can be renamed with the arg_name argument.
The pmf prefix on the evaluation columns can be changed with the fn_prefix argument.
The distribution names can be changed by assigning name-value pairs for the input distributions.

Let's practice this with the addition of a second distribution.

d2 <- dst_geom(0.4)
enframe_pmf(
  model1 = d1, model2 = d2, at = 0:5,
  arg_name = "num_failures", fn_prefix = "probability"
)

Drawing a random sample

To draw a random sample from a distribution, use the realise() or realize() function:

set.seed(42)
realise(d1, n = 5)

You can read this call as "realise distribution d five times". By default, n is set to 1, so that realising converts a distribution to a numeric draw:

realise(d1)

While random sampling falls into the same family as the p*/d*/q*/r* functions from the stats package (e.g., rnorm()), this function is not a distributional representation, hence does not have a eval_*() or enframe_*() counterpart. This is because it's impossible to perfectly describe a distribution based on a sample.

Properties of Distributions

distionary refers to a distribution property as any value that can be calculated from a distribution, such as the mean and variance. Whereas a distributional representation must fully define a distribution, a property need not.

Below is a table of the properties incorporated in distionary, and the corresponding functions for accessing them.

| Property | distionary Function | |----------|---------------------| | Mean | mean() | | Median | median() | | Variance | variance() | | Standard Deviation | sd() | | Skewness | skewness() | | Excess Kurtosis | kurtosis_exc() | | Kurtosis | kurtosis() |

Here's the mean and variance of our original distribution.

mean(d1)
variance(d1)

Some properties are easy to make yourself. Here is an example of a function that calculates interquartile range.

# Make a function that takes a distribution as input, and returns the
# interquartile range.
iqr <- function(distribution) {
  diff(eval_quantile(distribution, at = c(0.25, 0.75)))
}

Apply the function.

iqr(d2)

For properties that are not handled by distionary (e.g., extreme value index, or moment generating function), one option is to build these properties into your own distribution. A future version of distionary will make user-defined properties easier to work with.

vincenzocoia/distionary documentation built on April 5, 2025, 5:20 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com