knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(distionary)
One purpose of distplyr
is to handle the menial distribution-related calculations for you. Just specify a distribution once, and there is no need to manage its components anymore.
Example: want to compute the variance of a Uniform(-1, 1) distribution, get the 0.25- and 0.75-quantiles, and generate a sample of size 10?
Without distplyr
:
a <- -1 b <- 1 # Look up formula for variance: (b - a) ^ 2 / 12 # Get quantiles: qunif(c(0.25, 0.75), min = a, max = b) # Get sample of size 10: runif(10, min = a, max = b)
With distplyr
:
d <- dst_unif(-1, 1) variance(d) eval_quantile(d, at = c(0.25, 0.75)) realise(d, 10)
A distribution can be represented by different functions, such as a density function, a cumulative distribution function, and others. In distplyr
, you can:
eval_*
;enframe_*
; or get_*
. Here are the representations and the corresponding distplyr
functions:
| Quantity | distplyr
Functions |
|----------------------------------|-----------------------------------|
| Cumulative Distribution Function | eval_cdf()
, get_cdf()
, enframe_cdf()
|
| Survival Function | eval_survival()
, get_survival()
, enframe_survival()
|
| Quantile Function | eval_quantile()
, get_quantile()
, enframe_quantile()
|
| Hazard Function | eval_hazard()
, get_hazard()
, enframe_hazard()
|
| Cumulative Hazard Function | eval_chf()
, get_chf()
, enframe_chf()
|
| Probability density function | eval_density()
, get_density()
, enframe_density()
|
| Probability mass function | eval_pmf()
, get_pmf()
, enframe_pmf()
|
These functions all take a distribution object as their first argument, and eval_*
and enframe_*
have a second argument named at
indicating where to evaluate the function. The at
argument is vectorized.
Here is an example of evaluating the hazard function and the random sample generator of a Uniform(-1,1) distribution, and enframing the density:
eval_hazard(d, at = 0:10) enframe_density(d, at = 0:10) set.seed(10)
enframe()
works particularly well with tibbles and tidyr::unnest()
:
# half_marathon <- tribble( # ~ person, ~ race_time_min, # "Vincenzo", dst_norm(130, 25), # "Colleen", dst_norm(110, 13), # "Regina", dst_norm(115, 20) # ) # half_marathon %>% # mutate(quartiles = map(race_time_min, enframe_quantile, at = 1:3 / 4)) %>% # unnest(quartiles)
To draw a random sample from a distribution, use the realise()
or realize()
function:
realise(d, n = 5)
You can read this call as "realise distribution d
five times". By default, n
is set to 1, so that realizing a distribution converts it to a numeric draw:
realise(d)
This default is especially useful when working with distributions in a tibble:
# half_marathon %>% # mutate(actual_time_min = map_dbl(race_time_min, realise))
Perhaps surprisingly, distplyr does not consider realise()
as a functional representation of a distribution, even though random sampling falls into the same family as the stats::p*/d*/q*/r*
functions. This is because it's impossible to perfectly describe a distribution based on a sample.
Distributions have various numeric properties. Common examples are the mean and variance, but there are many others as well.
Below is a table of the properties incorporated in distplyr
:
| Property | distplyr
Function |
|----------|---------------------|
| Mean | mean()
|
| Median | median()
|
| Variance | variance()
|
| Standard Deviation | sd()
|
| Skewness | skewness()
|
| Excess Kurtosis | kurtosis_exc()
|
| Kurtosis | kurtosis_raw()
|
| Extreme Value (Tail) Index | evi()
|
Here are some properties of our original Uniform(-1, 1) distribution:
mean(d) stdev(d) evi(d)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.