tidy_distribution_comparison: Compare Empirical Data to Distributions

View source: R/utils-distribution-comparison.R

tidy_distribution_comparisonR Documentation

Compare Empirical Data to Distributions

Description

Compare some empirical data set against different distributions to help find the distribution that could be the best fit.

Usage

tidy_distribution_comparison(
  .x,
  .distribution_type = "continuous",
  .round_to_place = 3
)

Arguments

.x

The data set being passed to the function

.distribution_type

What kind of data is it, can be one of continuous or discrete

.round_to_place

How many decimal places should the parameter estimates be rounded off to for distibution construction. The default is 3

Details

The purpose of this function is to take some data set provided and to try to find a distribution that may fit the best. A parameter of .distribution_type must be set to either continuous or discrete in order for this the function to try the appropriate types of distributions.

The following distributions are used:

Continuous:

  • tidy_beta

  • tidy_cauchy

  • tidy_exponential

  • tidy_gamma

  • tidy_logistic

  • tidy_lognormal

  • tidy_normal

  • tidy_pareto

  • tidy_uniform

  • tidy_weibull

Discrete:

  • tidy_binomial

  • tidy_geometric

  • tidy_hypergeometric

  • tidy_poisson

The function itself returns a list output of tibbles. Here are the tibbles that are returned:

  • comparison_tbl

  • deviance_tbl

  • total_deviance_tbl

  • aic_tbl

  • kolmogorov_smirnov_tbl

  • multi_metric_tbl

The comparison_tbl is a long tibble that lists the values of the density function against the given data.

The deviance_tbl and the total_deviance_tbl just give the simple difference from the actual density to the estimated density for the given estimated distribution.

The aic_tbl will provide the AIC for a lm model of the estimated density against the emprical density.

The kolmogorov_smirnov_tbl for now provides a two.sided estimate of the ks.test of the estimated density against the empirical.

The multi_metric_tbl will summarise all of these metrics into a single tibble.

Value

An invisible list object. A tibble is printed.

Author(s)

Steven P. Sanderson II, MPH

Examples

xc <- mtcars$mpg
output_c <- tidy_distribution_comparison(xc, "continuous")

xd <- trunc(xc)
output_d <- tidy_distribution_comparison(xd, "discrete")

output_c


TidyDensity documentation built on Nov. 2, 2023, 5:38 p.m.