There already exists a great package for information theory measures (Cover and Thomas 2001), called "infotheo" (Meyer 2014). tidyinftheo wraps around a few of the functions in the "infotheo" package. 'Tidy-style' data manipulation in R. Some key differences is that this package:
shannon_entropy(.data, ..., na.rm=FALSE)
shannon_cond_entropy(.data, ..., na.rm=FALSE)
mutual_info(.data, ..., normalized=FALSE, na.rm=FALSE)
mutual_info_matrix(.data, ..., normalized=FALSE, na.rm=FALSE)
mutual_info_heatmap(mi_matrix, title=NULL, font_sizes=c(12,12))
You can install tidyinftheo from github with:
devtools::install_github("pohlio/tidyinftheo")
then load:
library(tidyinftheo)
Calculate (in bits) the Shannon Entropy of the eye color variable in the starwars
dataset:
starwars %>% shannon_entropy(eye_color)
#> [1] 3.117176
With the classic mtcars
dataset, choose some columns to make a matrix of mutual information pairwise comparisons. In particular, the cyl, vs, am, gear, and carb columns are all whole numbers indicating they belong to a category. The other columns are continuous and are better suited to correlation comparisons, unless they're discretized. Here are the first few rows of mtcars:
mtcars %>% select(cyl, vs, am, gear, carb) %>% head()
| | cyl| vs| am| gear| carb| |-------------------|----:|----:|----:|-----:|-----:| | Mazda RX4 | 6| 0| 1| 4| 4| | Mazda RX4 Wag | 6| 0| 1| 4| 4| | Datsun 710 | 4| 1| 1| 4| 1| | Hornet 4 Drive | 6| 1| 0| 3| 1| | Hornet Sportabout | 8| 0| 0| 3| 2| | Valiant | 6| 1| 0| 3| 1|
And here is our comparison table. There should be 5-choose-2 = 10 different combinations. NMI stands for Normalized Mutual Information, so the mutual information, normally given in bits, is scaled between 0 and 1:
mi_matr <- as_tibble(mtcars) %>%
mutate_if(is_double, as.character) %>%
mutual_info_matrix(cyl, vs, am, gear, carb, normalized=TRUE)
mi_matr
| V1 | V2 | MI| |:-----|:-----|----------:| | cyl | vs | 0.4937932| | cyl | am | 0.1672528| | cyl | gear | 0.3504372| | cyl | carb | 0.3983338| | vs | am | 0.0208314| | vs | gear | 0.2397666| | vs | carb | 0.2861119| | am | gear | 0.5173527| | am | carb | 0.1149038| | gear | carb | 0.1905054|
The matrix is already in a convenient format to plot:
p <- mutual_info_heatmap(mi_matr)
print(p)
NOTE: The above SVG may or may not render 100% correctly. Sometimes the legend lacks the color swatch. This may be a problem with ggplot2
or web browsers.
Cover, Thomas M., and Joy A. Thomas. 2001. Elements of Information Theory. 2nd ed. 10th Ser. New York, NY: John Wiley & Sons, Inc.
Meyer, Patrick E. 2014. Infotheo: Information-Theoretic Measures. https://CRAN.R-project.org/package=infotheo.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.