mutualinfo: Mutual Information of continuous and discrete variables.

View source: R/mutualinfo.R

mutualinfoR Documentation

Mutual Information of continuous and discrete variables.

Description

Return mutual information for a pair of joint variables. The variables can either be both numeric, both discrete or a mixture. The calculation is done via density estimate whenever necessary (i.e. for the continuous variables). The density is estimated via pareto density estimation with subsequent gaussian kernel smoothing.

Usage

mutualinfo(x, y, isXDiscrete = FALSE, isYDiscrete = FALSE,

eps=.Machine$double.eps*1000, useMPMI=FALSE,na.rm=FALSE)

Arguments

x

[1:n] a numeric vector (not necessarily continuous)

y

[1:n] a numeric vector (not necessarily continuous)

isXDiscrete

Boolean defining whether or not the first numeric vector resembles a continuous or discrete measurement

isYDiscrete

Boolean defining whether or not the second numeric vector resembles a continuous or discrete measurement

eps

Scalar, The threshold for which the mutual info summand should be ignored (the limit of the summand for x -> 0 is 0 but the logarithm will be -inf...)

useMPMI

Boolean defining whether or not to use the package mpmi for the calculation (will be used as a baseline)

na.rm

Boolean defining whether or not to use complete obeservations only

Details

Mutual Information is >= 0 and symmetric (in x and y). You can think of mutual information as a measure of how much of x's information is contained in y's information or put more simply: How much does y predict x. Note that mutual information can be compared for pairs that share one variable e.g. (x,y) and (y,z), if MI(x,y) > MI(y,z) then x and y are more closely linked than y and z. However given pairs that do not share a variable, e.g. (x,y), (u,v) then MI(x,y) and MI(u,v) can not be reasonably compared. In particular: MI defines a partial ordering on the column pairs of a matrix instead of a total ordering (which correlation does for example). This is mainly due to MI not being upper-bound and thus is not reasonable put on a scale from 0 to 1.

Value

mutualinfo

The mutual information of the variables

Note

This function requires that either DataVisualizations and ScatterDensity of equal or higher version than 0.1.1 is installed, or mpmi package

Author(s)

Julian Märte, Michael Thrun

References

Claude E. Shannon: A Mathematical Theory of Communication, 1948

Examples

x = c(rnorm(1000),rnorm(2000)+8,rnorm(1000)*2-8)
y = c(rep(1, 1000), rep(2, 2000), rep(3,1000))


if(requireNamespace("DataVisualizations", quietly = TRUE) &&
   requireNamespace("ScatterDensity", quietly = TRUE) &&
   packageVersion("ScatterDensity") >= "0.1.1" &&
    packageVersion("DataVisualizations") >= "1.1.5"){
      
  mutualinfo(x, y, isXDiscrete=FALSE, isYDiscrete=TRUE)
}
  

  if(requireNamespace("mpmi", quietly = TRUE)) {
      
  mutualinfo(x, y, isXDiscrete=FALSE, isYDiscrete=TRUE,useMPMI=TRUE)
  }


memshare documentation built on Dec. 5, 2025, 9:07 a.m.