knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(distionary)
This vignette covers the second goal of distionary
: to evaluate probability distributions, even when that property is not specified in the distribution's definition.
A distributional representation is a function that fully describes the distribution, such that any property can be calculated from it. Here is a list of representations recognised by distionary
, and the functions for accessing them.
| Representation | distionary
Functions |
|----------------------------------|-----------------------------------------|
| Cumulative Distribution Function | eval_cdf()
, enframe_cdf()
|
| Survival Function | eval_survival()
, enframe_survival()
|
| Quantile Function | eval_quantile()
, enframe_quantile()
|
| Hazard Function | eval_hazard()
, enframe_hazard()
|
| Cumulative Hazard Function | eval_chf()
, enframe_chf()
|
| Probability density Function | eval_density()
, enframe_density()
|
| Probability mass Function (PMF) | eval_pmf()
, enframe_pmf()
|
| Odds Function | eval_odds()
, enframe_odds()
|
| Return Level Function | eval_return()
, enframe_return()
|
All representations can either be accessed by the eval_*()
family of functions, providing a vector of the evaluated representation.
d1 <- dst_geom(0.6) eval_pmf(d1, at = 0:5)
Alternatively, the enframe_*()
family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow.
enframe_pmf(d1, at = 0:5)
The enframe_*()
functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways:
.arg
can be renamed with the arg_name
argument.pmf
prefix on the evaluation columns can be changed with the fn_prefix
argument.Let's practice this with the addition of a second distribution.
d2 <- dst_geom(0.4) enframe_pmf( model1 = d1, model2 = d2, at = 0:5, arg_name = "num_failures", fn_prefix = "probability" )
To draw a random sample from a distribution, use the realise()
or realize()
function:
set.seed(42) realise(d1, n = 5)
You can read this call as "realise distribution d
five times". By default, n
is set to 1, so that realising converts a distribution to a numeric draw:
realise(d1)
While random sampling falls into the same family as the p*/d*/q*/r*
functions from the stats
package (e.g., rnorm()
), this function is not a distributional representation, hence does not have a eval_*()
or enframe_*()
counterpart. This is because it's impossible to perfectly describe a distribution based on a sample.
distionary
refers to a distribution property as any value that can be calculated from a distribution, such as the mean and variance. Whereas a distributional representation must fully define a distribution, a property need not.
Below is a table of the properties incorporated in distionary
, and the corresponding functions for accessing them.
| Property | distionary
Function |
|----------|---------------------|
| Mean | mean()
|
| Median | median()
|
| Variance | variance()
|
| Standard Deviation | sd()
|
| Skewness | skewness()
|
| Excess Kurtosis | kurtosis_exc()
|
| Kurtosis | kurtosis()
|
Here's the mean and variance of our original distribution.
mean(d1) variance(d1)
Some properties are easy to make yourself. Here is an example of a function that calculates interquartile range.
# Make a function that takes a distribution as input, and returns the # interquartile range. iqr <- function(distribution) { diff(eval_quantile(distribution, at = c(0.25, 0.75))) }
Apply the function.
iqr(d2)
For properties that are not handled by distionary
(e.g., extreme value index, or moment generating function), one option is to build these properties into your own distribution. A future version of distionary
will make user-defined properties easier to work with.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.