# discrete_entropy: Shannon entropy for discrete pmf In ForeCA: Forecastable Component Analysis

## Description

Computes the Shannon entropy \mathcal{H}(p) = -∑_{i=1}^{n} p_i \log p_i of a discrete RV X taking values in \lbrace x_1, …, x_n \rbrace with probability mass function (pmf) P(X = x_i) = p_i with p_i ≥q 0 for all i and ∑_{i=1}^{n} p_i = 1.

## Usage

 1 2 3 4 5 6 7 8 discrete_entropy( probs, base = 2, method = c("MLE"), threshold = 0, prior.probs = NULL, prior.weight = 0 ) 

## Arguments

 probs numeric; probabilities (empirical frequencies). Must be non-negative and add up to 1. base logarithm base; entropy is measured in “nats” for base = exp(1); in “bits” if base = 2 (default). method string; method to estimate entropy; see Details below. threshold numeric; frequencies below threshold are set to 0; default threshold = 0, i.e., no thresholding. If prior.weight > 0 then thresholding will be done before smoothing. prior.probs optional; only used if prior.weight > 0. Add a prior probability distribution to probs. By default it uses a uniform distribution putting equal probability on each outcome. prior.weight numeric; how much weight does the prior distribution get in a mixture model between data and prior distribution? Must be between 0 and 1. Default: 0 (no prior).

## Details

discrete_entropy uses a plug-in estimator (method = "MLE"):

\widehat{\mathcal{H}}(p) = - ∑_{i=1}^{n} \widehat{p}_i \log \widehat{p}_i.

If prior.weight > 0, then it mixes the observed proportions \widehat{p}_i with a prior distribution

\widehat{p}_i ≤ftarrow (1-λ) \cdot \widehat{p_i} + λ \cdot prior_i, \quad i=1, …, n,

where λ \in [0, 1] is the prior.weight parameter. By default the prior is a uniform distribution, i.e., prior_i = \frac{1}{n} for all i.

Note that this plugin estimator is biased. See References for an overview of alternative methods.

## Value

numeric; non-negative real value.

## References

Archer E., Park I. M., Pillow J.W. (2014). “Bayesian Entropy Estimation for Countable Discrete Distributions”. Journal of Machine Learning Research (JMLR) 15, 2833-2868. Available at http://jmlr.org/papers/v15/archer14a.html.

continuous_entropy
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 probs.tmp <- rexp(5) probs.tmp <- sort(probs.tmp / sum(probs.tmp)) unif.distr <- rep(1/length(probs.tmp), length(probs.tmp)) matplot(cbind(probs.tmp, unif.distr), pch = 19, ylab = "P(X = k)", xlab = "k") matlines(cbind(probs.tmp, unif.distr)) legend("topleft", c("non-uniform", "uniform"), pch = 19, lty = 1:2, col = 1:2, box.lty = 0) discrete_entropy(probs.tmp) # uniform has largest entropy among all bounded discrete pmfs # (here = log(5)) discrete_entropy(unif.distr) # no uncertainty if one element occurs with probability 1 discrete_entropy(c(1, 0, 0))