README.md

tidyinftheo

Travis-CI Build Status codecov AppVeyor Build Status

Overview

There already exists a great package for information theory measures (Cover and Thomas 2001), called "infotheo" (Meyer 2014). tidyinftheo wraps around a few of the functions in the "infotheo" package. 'Tidy-style' data manipulation in R. Some key differences is that this package:

Functions

Installation

You can install tidyinftheo from github with:

devtools::install_github("pohlio/tidyinftheo")

then load:

library(tidyinftheo)

Examples

Calculate (in bits) the Shannon Entropy of the eye color variable in the starwars dataset:

starwars %>% shannon_entropy(eye_color)
#> [1] 3.117176

With the classic mtcars dataset, choose some columns to make a matrix of mutual information pairwise comparisons. In particular, the cyl, vs, am, gear, and carb columns are all whole numbers indicating they belong to a category. The other columns are continuous and are better suited to correlation comparisons, unless they're discretized. Here are the first few rows of mtcars:

mtcars %>% select(cyl, vs, am, gear, carb) %>% head()

| | cyl| vs| am| gear| carb| |-------------------|----:|----:|----:|-----:|-----:| | Mazda RX4 | 6| 0| 1| 4| 4| | Mazda RX4 Wag | 6| 0| 1| 4| 4| | Datsun 710 | 4| 1| 1| 4| 1| | Hornet 4 Drive | 6| 1| 0| 3| 1| | Hornet Sportabout | 8| 0| 0| 3| 2| | Valiant | 6| 1| 0| 3| 1|

And here is our comparison table. There should be 5-choose-2 = 10 different combinations. NMI stands for Normalized Mutual Information, so the mutual information, normally given in bits, is scaled between 0 and 1:

mi_matr <- as_tibble(mtcars) %>% 
    mutate_if(is_double, as.character) %>%
    mutual_info_matrix(cyl, vs, am, gear, carb, normalized=TRUE)
mi_matr

| V1 | V2 | MI| |:-----|:-----|----------:| | cyl | vs | 0.4937932| | cyl | am | 0.1672528| | cyl | gear | 0.3504372| | cyl | carb | 0.3983338| | vs | am | 0.0208314| | vs | gear | 0.2397666| | vs | carb | 0.2861119| | am | gear | 0.5173527| | am | carb | 0.1149038| | gear | carb | 0.1905054|

The matrix is already in a convenient format to plot:

p <- mutual_info_heatmap(mi_matr)
print(p)

NOTE: The above SVG may or may not render 100% correctly. Sometimes the legend lacks the color swatch. This may be a problem with ggplot2 or web browsers.

References

Cover, Thomas M., and Joy A. Thomas. 2001. Elements of Information Theory. 2nd ed. 10th Ser. New York, NY: John Wiley & Sons, Inc.

Meyer, Patrick E. 2014. Infotheo: Information-Theoretic Measures. https://CRAN.R-project.org/package=infotheo.



Try the tidyinftheo package in your browser

Any scripts or data that you put into this service are public.

tidyinftheo documentation built on Dec. 1, 2017, 1:01 a.m.