README.md
In tkmckenzie/d3po: D3 Popular Outputs

d3po

The r2d3 package provides a general framework with which R objects can be used to produce D3 documents. While the package provides very general capabilities, there are a handful of commonly used D3 visualizations that users must build by hand. The d3po package provides some of these common visualizations in an easier-to-use format, leveraging r2d3 to produce the D3 documents.

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("tkmckenzie/d3po")

Several common D3 visualizations are included with d3po. Examples are shown below.

Chord diagrams are often used to show relationships within a set, with strength of relationships represented by size of the chord. The d3po::chord function allows relationships to be specified either in an edgelist or as an adjacency matrix. Symmetric adjacency matrices with diagonal of zeros (or equivalent edgelists) are advised as they are more interpretable, but those characteristics are not required.

labels = letters[1:3]

adjacency.matrix = matrix(c(0, 1, 2,
                            1, 0, 3,
                            2, 3, 0),
                          nrow = 3, byrow = TRUE)

chord(adjacency.matrix = adjacency.matrix, labels = labels)

plot of chunk unnamed-chunk-2

The d3po package includes the ability to make choropleth maps at the U.S. county or state level. The vector c("District of Columbia", datasets::state.name) can be used to construct state-level data, and d3po::us.counties can be used to construct county-level data.

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df.state = data.frame(state = c("District of Columbia", state.name),
                      value.state = rnorm(51, sd = 3))
df.county = us.counties %>%
  mutate(value.county = rnorm(n(), sd = 1)) %>%
  left_join(df.state, by = "state") %>%
  mutate(value = value.state + value.county)

choropleth.state(df.state, value.column = "value.state")

plot of chunk unnamed-chunk-3

choropleth.county(df.county)

plot of chunk unnamed-chunk-3

Word clouds can be created with d3po::cloud by supplying a list of words and relative sizes. Words can also be individually colored or colored by group.

require(dplyr)
require(rcorpora)
#> Loading required package: rcorpora

words = corpora("words/oprah_quotes")$oprahQuotes
words = tolower(unlist(strsplit(words, " ")))
words = gsub("[.,-;]+", "", words)
words = gsub("’", "'", words)
words = words[nchar(words) > 0]

stopwords = corpora("words/stopwords/en")$stopWords

word.df = data.frame(text = words)
word.df = word.df %>%
  filter(!(text %in% stopwords)) %>%
  group_by(text) %>%
  summarize(value = n(), .groups = "drop") %>%
  arrange(desc(value)) %>%
  slice(1:25) %>%
  mutate(value = 3 * value) # To make words a little larger in final image

cloud(word.df, text.color = "word", color.scheme = "RdYlGn")

plot of chunk unnamed-chunk-4

Marimekko charts are used to show both proportions amongst groups and total numbers relative to the overall population. These charts are especially useful for comparing similar variables across groups to show how those variables change proportional to both group size and the total population.

num.groups = 5
num.vars = 3
df = data.frame(group = rep(LETTERS[1:num.groups], each = num.vars),
                var = rep(letters[1:num.vars], times = num.groups),
                value = runif(num.groups * num.vars, 0, 10))
marimekko(df, x.column = "group", y.column = "var", color.scheme = "Plasma")

plot of chunk unnamed-chunk-5

Sankey diagrams are useful for showing relationships between sets, with strength of relationships represented by size of the link. The function d3po::sankey creates a Sankey diagram from an edgelist.

num.targets = 5
num.sources = 3
df = data.frame(source = rep(LETTERS[1:num.sources], each = num.targets),
                target = rep(letters[1:num.targets], times = num.sources),
                value = runif(num.groups * num.vars, 0, 10))
sankey(df)

plot of chunk unnamed-chunk-6