knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(dplyr.print_min = 4L, dplyr.print_max = 4L) library(streamgraph)
The streamgraph
pacakge is an htmlwidget
^[http://www.htmlwidgets.org/] that is based on the D3.js
^[http://d3js.org/] JavaScript library.
"Streamgraphs are a generalization of stacked area graphs where the baseline is free. By shifting the baseline, it is possible to minimize the change in slope (or _wiggle) in individual series, thereby making it easier to perceive the thickness of any given layer across the data. Byron & Wattenberg describe several streamgraph algorithms in 'Stacked Graphs—Geometry & Aesthetics^[http://www.leebyron.com/else/streamgraph/]'"_^[Bostock. http://bl.ocks.org/mbostock/4060954]
Even though streamgraphs can be controversial^[Kirk. http://www.visualisingdata.com/index.php/2010/08/making-sense-of-streamgraphs/], they make for very compelling visualizations, especially when displaying very large datasets. They work even better when there is an interactive component involved that enables the following of each "flow" or allow filtering the view in some way. This makes R a great choice for streamgraph creation & exploration given that it excels at data manipulation and has libraries such as Shiny^[http://shiny.rstudio.com/] that reduce the complexity of the creation of interactive interfaces.
The first example replicates the streamgraphs in the Name Voyager^[http://www.bewitched.com/namevoyager.html] project. We'll use the R babynames
package^[Wickham. http://cran.r-project.org/package=babynames] as the data source and use the streamgraph
package to see the ebb & flow of "Kr-
" and "I-
" names in the United States over the years (1880-2013).
library(dplyr) library(babynames) library(streamgraph) babynames %>% filter(grepl("^Kr", name)) %>% group_by(year, name) %>% tally(wt=n) %>% streamgraph("name", "n", "year")
You create streamgraphs with the streamgraph
function. This first example uses the default values for the aesthetic properties of the streamgraph, but we have passed in "name
", "n
" and "year
" for the key
, value
and date
parameters. If your data already has column names in the expected format, you do not need to specify any values for those parameters.
The current version of streamgraph
requires a date-based x-axis, but is smart enough to notice if the values for the date
column are years and automatically performs the necessary work under the covers to convert the data into the required format for the underlying D3 work.
The default behavior of the streamgraph
function is to have the graph centered in the y-axis, with smoothed "streams".
library(dplyr) library(babynames) library(streamgraph) babynames %>% filter(grepl("^I", name)) %>% group_by(year, name) %>% tally(wt=n) %>% streamgraph("name", "n", "year", offset="zero", interpolate="linear")
This example changes the baseline for the streamgraph
to 0
and uses a linear interpolation (making the graph more "pointy").
The data to use for a streamgraph
should be in "long format"^[http://blog.rstudio.org/2014/07/22/introducing-tidyr/]. The following example shows how to produce a streamgraph
from the ggplot2
movies
data set.
ggplot2movies::movies %>% select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>% tidyr::gather(genre, value, -year) %>% group_by(year, genre) %>% tally(wt=value) %>% ungroup %>% streamgraph("genre", "n", "year") %>% sg_axis_x(20) %>% sg_colors("PuOr")
We first select the columns we want to be in "streams", then gather them up and count them by year. We make one change to the aesthetics by using year ticks every 20 years. We also select a different ColorBrewer^[http://colorbrewer2.org/] palette for the graph.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.