hai_distribution_comparison_tbl: Compare Data Against Distributions

View source: R/tbl-hai-distribution-comparison.R

hai_distribution_comparison_tblR Documentation

Compare Data Against Distributions

Description

This function will attempt to get some key information on the data you pass to it. It will also automatically normalize the data from 0 to 1. This will not change the distribution just it's scale in order to make sure that many different types of distributions can be fit to the data, which should help identify what the distribution of the passed data could be.

The resulting output has attributes added to it that get used in other functions that are meant to compliment each other.

This function will automatically pass the .x parameter to hai_skewness_vec() and hai_kurtosis_vec() in order to help create the random data from the distributions.

The distributions that can be chosen from are:

Distribution R stats::dist
normal rnorm
uniform runif
exponential rexp
logistic rlogis
beta rbeta
lognormal rlnorm
gamma rgamma
weibull weibull
chisquare rchisq
cauchy rcauchy
hypergeometric rhyper
f rf
poisson rpois

Usage

hai_distribution_comparison_tbl(
  .x,
  .distributions = c("gamma", "beta"),
  .normalize = TRUE
)

Arguments

.x

The numeric vector to analyze.

.distributions

A character vector of distributions to check. For example, c("gamma","beta")

.normalize

A boolean value of TRUE/FALSE, the default is TRUE. This will normalize the data using the hai_scale_zero_one_vec function.

Details

Get information on the empirical distribution of your data along with generated densities of other distributions. This information is in the resulting tibble that is generated. Three columns will generate, Distribution, from the ā param .distributionsā , dist_data which is a list vector of density values passed to the underlying stats r distribution function, and density_data, which is the dist_data column passed to list(stats::density(unlist(dist_data)))

This has the effect of giving you the desired vector that can be used in resultant plots (dist_data) or you can interact with the density object itself.

If the skewness of the distribution is negative, then for the gamma and beta distributions the skew is set equal to the kurtosis and the kurtosis is set equal to sqrt((skew)^2)

Value

A tibble.

Author(s)

Steven P. Sanderson II, MPH

See Also

Other Distribution Functions: hai_get_density_data_tbl(), hai_get_dist_data_tbl()

Examples

x_vec <- hai_scale_zero_one_vec(mtcars$mpg)
df <- hai_distribution_comparison_tbl(
  .x = x_vec,
  .distributions = c("beta", "gamma")
)
df


healthyR.ai documentation built on April 3, 2023, 5:24 p.m.