calculateSelfInformation_Grassberger: calculate self information of a discrete value (X) using a...

View source: R/tidyDiscreteSelfInformation.R

calculateSelfInformation_GrassbergerR Documentation

calculate self information of a discrete value (X) using a histogram approach using the following method

Description

P. Grassberger, “Entropy Estimates from Insufficient Samplings,” arXiv [physics.data-an], 29-Jul-2003 [Online]. Available: http://arxiv.org/abs/physics/0307138

Usage

calculateSelfInformation_Grassberger(df, groupVars, countVar = NULL, ...)

Arguments

df

- may be grouped, in which case the grouping is interpreted as different types of discrete variable

groupVars

- the columns of the discrete value quoted by the vars() function (e.g. ggplot facet_wrap)

countVar

- (optional) if this datafram represents summary counts, the columns of the summary variable.

Details

but with a digamma based function (rather than harmonics) detailed in eqns 31 & 35. For our purposes we fix l=0 to give the form in eqn 27. The error in this method is supposedly better for undersampled cases (where number of bins similar to number of samples)

This is a bit of a cheat as works out the overall entropy and then scales that to get the self information but seems to produce the right answer

Value

a dataframe containing the disctinct values of the groups of df, and for each group an entropy value (H). If df was not grouped this will be a single entry


terminological/tidy-info-stats documentation built on Nov. 19, 2022, 11:23 p.m.