entropy: Normalized entropy

View source: R/discrete-summaries.R

entropyR Documentation

Normalized entropy

Description

Normalized entropy, for measuring dispersion in draws from categorical distributions.

Usage

entropy(x)

## Default S3 method:
entropy(x)

## S3 method for class 'rvar'
entropy(x)

Arguments

x

(multiple options) A vector to be interpreted as draws from a categorical distribution, such as:

  • A factor

  • A numeric (should be integer or integer-like)

  • An rvar, rvar_factor, or rvar_ordered

Details

Calculates the normalized Shannon entropy of the draws in x. This value is the entropy of x divided by the maximum entropy of a distribution with n categories, where n is length(unique(x)) for numeric vectors and length(levels(x)) for factors:

-\frac{\sum_{i = 1}^{n} p_i \log(p_i)}{\log(n)}

This scales the output to be between 0 (all probability in one category) and 1 (uniform). This form of normalized entropy is referred to as H_\mathrm{REL} in Wilcox (1967).

Value

If x is a factor or numeric, returns a length-1 numeric vector with a value between 0 and 1 (inclusive) giving the normalized Shannon entropy of x.

If x is an rvar, returns an array of the same shape as x, where each cell is the normalized Shannon entropy of the draws in the corresponding cell of x.

References

Allen R. Wilcox (1967). Indices of Qualitative Variation (No. ORNL-TM-1919). Oak Ridge National Lab., Tenn.

Examples

set.seed(1234)

levels <- c("a", "b", "c", "d", "e")

# a uniform distribution: high normalized entropy
x <- factor(
  sample(levels, 4000, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2)),
  levels = levels
)
entropy(x)

# a unimodal distribution: low normalized entropy
y <- factor(
  sample(levels, 4000, replace = TRUE, prob = c(0.95, 0.02, 0.015, 0.01, 0.005)),
  levels = levels
)
entropy(y)

# both together, as an rvar
xy <- c(rvar(x), rvar(y))
xy
entropy(xy)

posterior documentation built on Nov. 2, 2023, 5:56 p.m.