entropy.empirical: Empirical Estimators of Entropy and Mutual Information and...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/entropy.empirical.R

Description

freqs.empirical computes the empirical frequencies from counts y.

entropy.empirical estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y by plug-in of the empirical frequencies.

KL.empirical computes the empirical Kullback-Leibler (KL) divergence from counts y1 and y2.

chi2.empirical computes the empirical chi-squared divergence from counts y1 and y2.

mi.empirical computes the empirical mutual information from a table of counts y2d.

chi2indep.empirical computes the empirical chi-squared divergence of independence from a table of counts y2d.

Usage

1
2
3
4
5
6
freqs.empirical(y)
entropy.empirical(y, unit=c("log", "log2", "log10"))
KL.empirical(y1, y2, unit=c("log", "log2", "log10"))
chi2.empirical(y1, y2, unit=c("log", "log2", "log10"))
mi.empirical(y2d, unit=c("log", "log2", "log10"))
chi2indep.empirical(y2d, unit=c("log", "log2", "log10"))

Arguments

y

vector of counts.

y1

vector of counts.

y2

vector of counts.

y2d

matrix of counts.

unit

the unit in which entropy is measured. The default is "nats" (natural units). For computing entropy in "bits" set unit="log2".

Details

The empirical entropy estimator is a plug-in estimator: in the definition of the Shannon entropy the bin probabilities are replaced by the respective empirical frequencies.

The empirical entropy estimator is the maximum likelihood estimator. If there are many zero counts and the sample size is small it is very inefficient and also strongly biased.

Value

freqs.empirical returns the empirical frequencies.

entropy.empirical returns an estimate of the Shannon entropy.

KL.empirical returns an estimate of the KL divergence.

chi2.empirical returns the empirical chi-squared divergence.

mi.empirical returns an estimate of the mutual information.

chi2indep.empirical returns the empirical chi-squared divergence of independence.

Author(s)

Korbinian Strimmer (https://strimmerlab.github.io).

See Also

entropy, entropy.plugin, KL.plugin, chi2.plugin, mi.plugin, chi2indep.plugin, Gstat, Gstatindep, chi2stat, chi2statindep, discretize.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# load entropy library 
library("entropy")


## a single variable: entropy

# observed counts for each bin
y = c(4, 2, 3, 0, 2, 4, 0, 0, 2, 1, 1)  

# empirical frequencies
freqs.empirical(y)

# empirical estimate of entropy
entropy.empirical(y)


## examples with two variables: KL and chi-squared divergence

# observed counts for first random variables (observed)
y1 = c(4, 2, 3, 1, 6, 4)
n = sum(y1) # 20

# counts for the second random variable (expected)
freqs.expected = c(0.10, 0.15, 0.35, 0.05, 0.20, 0.15)
y2 = n*freqs.expected

# empirical Kullback-Leibler divergence
KL.div = KL.empirical(y1, y2)
KL.div

# empirical chi-squared divergence
cs.div = chi2.empirical(y1, y2)
cs.div 
0.5*cs.div  # approximates KL.div

## note: see also Gstat and chi2stat


## joint distribution of two discrete random variables

# contingency table with counts for two discrete variables
y.mat = matrix(c(4, 5, 1, 2, 4, 4), ncol = 2)  # 3x2 example matrix of counts
n.mat = sum(y.mat) # 20

# empirical estimate of mutual information
mi = mi.empirical(y.mat)
mi

# empirical chi-squared divergence of independence
cs.indep = chi2indep.empirical(y.mat)
cs.indep
0.5*cs.indep # approximates mi

## note: see also Gstatindep and chi2statindep

entropy documentation built on Oct. 3, 2021, 1:06 a.m.