knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This document serves as an introduction to using heddlr
, a set of tools to
help dynamically build R Markdown documents based on repeating sections, for
somewhat more complicated reports. Through the course of this vignette, we'll
be going over the core functions which make up heddlr
and how they interact
to help make report generation faster, with the end goal of creating
this flexdashboard
based on the nycflights13 dataset as our final product.
One of the fundamental concepts of this package is that heddlr
is content
agnostic -- that is, it should be able to cooperate with any and all R
(and therefore, R Markdown) extensions. As such, making our dashboard is going
to require some non-heddlr
code to create our actual content. To start with,
we'll load a few basic libraries:
library(heddlr) library(dplyr) # For our data manipulation library(ggplot2) # For graphing utilities library(nycflights13) # Provides the flight dataset library(ggalluvial) # Helps build one of our charts
You'll also need to have purrr
, tidyr
, and forcats
installed to run this
document completely -- however, we won't load them outright in this document.
heddlr
aims to let you approach R Markdown documents like any other R script,
being able to abstract away repeated sections into individual patterns the
same way you'd normally abstract repeated code into functions. To do so,
heddlr
encourages users to split their R Markdown documents into pieces,
separating their data import, wrangling, and function creation into a
"generator" file while moving repeated report sections into various "pattern"
files. This document itself serves as a generator for our example dashboard --
it does all the hard work of loading the dataset and creating functions, uses
those to spin the dashboard together, and then renders the dashboard to create
an output.
We've already handled our data import and wrangling just by loading the
nycflights13
package, which means it's time to move onto building the
functions which will be responsible for creating most of our dashboard.
You'll notice
that our target dashboard
repeats the same four main components multiple times -- three graphs are
repeated for each airport, while one graph is repeated for each month and each
airport in the dataset. Our first step to reduce code duplication is to create
the actual functions which will produce those graphics in our dashboard:
# The polar coordinate plot of flights per hour graph_flights_per_hour <- function(origin_code) { flights %>% filter(origin == origin_code) %>% count(time_hour, hour) %>% group_by(hour) %>% summarise(n = mean(n)) %>% ggplot(aes(hour, n, fill = n)) + geom_col(color = "black") + scale_x_continuous(breaks = seq(2, 24, 2)) + scale_fill_viridis_c(begin = 0, end = 0.7, name = "") + theme_minimal() %+replace% theme(axis.text.y = element_blank()) + coord_polar() + labs( title = "Average number of flights per hour", subtitle = paste(origin_code, "Airport"), x = "", y = "" ) } graph_flights_per_hour("EWR")
# The graph of average flight delays per day for a month graph_mean_delay <- function(origin_code, month_number) { flights %>% filter(origin == origin_code & month == month_number) %>% mutate(date = lubridate::as_date(time_hour)) %>% group_by(date) %>% summarise(delay = mean(dep_delay, na.rm = T)) %>% ggplot(aes(date, delay)) + geom_col(color = "black") + scale_y_continuous(expand = expand_scale(mult = c(0, 0.07))) + theme_classic() + labs( x = "Day of month", y = "Delay (minutes)", title = "Average flight departure delays per day", subtitle = paste(origin_code, "Airport,", month.name[[month_number]]) ) } graph_mean_delay("EWR", 1)
# The sankey/flow chart of where flights head from each airport graph_sankey <- function(origin_code) { flights %>% count(origin, dest) %>% mutate(highlight = ifelse(origin == origin_code, 1, 0)) %>% ggplot(aes(axis1 = origin, axis2 = dest, y = n)) + geom_alluvium(aes(fill = highlight)) + geom_stratum(width = 1 / 10) + geom_label(stat = "stratum", infer.label = TRUE) + scale_y_continuous(expand = expand_scale(mult = c(0, 0.07))) + theme_classic() %+replace% theme( legend.position = "none", axis.text = element_blank() ) + labs( x = "", y = "", title = paste("Flights out of", "EWR") ) } graph_sankey("EWR")
# The barplot of delay by carrier graph_carriers <- function(origin_code) { flights %>% filter(origin == origin_code) %>% group_by(carrier) %>% summarise(dep_delay = mean(dep_delay, na.rm = T)) %>% mutate( carrier = factor(carrier), carrier = forcats::fct_reorder(carrier, dep_delay) ) %>% ggplot(aes(carrier, dep_delay)) + geom_col() + scale_y_continuous(expand = expand_scale(mult = c(0, 0.07))) + theme_classic() + labs( x = "Carrier", y = "Average departure delay (minutes)", title = "Mean departure delay by carrier", subtitle = paste("EWR", "Airport") ) } graph_carriers("EWR")
So now we have the functions that will produce our actual content for the dashboard. Now we need to repeat them with the proper arguments that will produce the 45 different graphics included in our dashboard. Additionally, we'll have to include the markdown syntax that will properly create menus and tabs for our dashboard.
This is where we'll start using functions from heddlr
. Imagine that the top
of our dashboard looks something like the following:
--- title: 2013 NYC Flights Dashboard author: Mike Mahoney output: flexdashboard::flex_dashboard: vertical_layout: fill orientation: rows --- # EWR {data-navmenu="Airport Overviews"} ## ### .```r graph_flights_per_hour("EWR") .``` ### .```r graph_sankey("EWR") .``` ### .```r graph_carriers("EWR") .``` ## {.tabset} ### January .```r graph_mean_delay("EWR", 1) .``` {...and so on, for 11 more months and 2 more airports}
As far as heddlr
is concerned, this dashboard is composed of three different
sections:
In heddlr
terminology, each one of those repeated sections is what we'll call a
pattern
. We typically store those patterns in their own files -- that makes
it a bit easier to reason about what each one needs to include and how they'll
piece together. For this dashboard, we've saved the second two sections off as
two pattern files -- we'll come back to the YAML header in a minute.
The file for the second section repeated once per airport is called
flights_site_level.Rmd
, and it looks like this:
# AIRCODE {data-navmenu="Airport Overviews"} ## ### .```r airportcode <- "AIRCODE" graph_flights_per_hour(airportcode) .``` ### .```r graph_sankey(airportcode) .``` ### .```r graph_carriers(airportcode) .``` ## {.tabset}
Note that I've replaced the multiple "EWR" references with two "AIRCODE" placeholder -- this really doesn't matter, but feels cleaner to me.
The pattern for the monthly-repeated sections is called
flights_month_level.Rmd
, and it looks like this:
### `.r month.name[[MONTHNUM]]` .```r graph_mean_delay(airportcode, MONTHNUM) .```
Note that here I've swapped the "EWR" reference with the airportcode
variable that we'll create in flights_site_level.Rmd
, and replaced
our explicit January
with a quick snippet of code that will print the correct
month name when we replace it with an integer value.
We have a few ways we can then import those pattern files into our R session to
work with. The first and simplest of these is import_pattern
, which just
takes as argument a file path:
pattern <- import_pattern("flights_site_level.Rmd") pattern
Of course, we usually have more than one pattern to work with, and it can be
nice to keep them all in one place. heddlr
helps you do this as well via the
import_draft
function, which takes named filepaths and stores them in a list:
draft <- import_draft( "site_pattern" = "flights_site_level.Rmd", "month_pattern" = "flights_month_level.Rmd" ) draft
There's one last function in heddlr
to help us create pattern objects, named
create_yaml_header
. If we wanted, we can include a pattern in our
import_draft
call that contains the YAML needed to render our R Markdown
document. However, create_yaml_header
also provides us the option of
creating that header natively in our generator document by passing it a list:
dashboard_header <- list( "title" = "2013 NYC Flights Dashboard", "author" = "Mike Mahoney", params = list( "flights" = "NA", "graph_flights_per_hour" = "NA", "graph_mean_delay" = "NA", "graph_sankey" = "NA", "graph_carriers" = "NA" ), "output" = list("flexdashboard::flex_dashboard" = list( "vertical_layout" = "fill", "orientation" = "rows" )) ) dashboard_header <- create_yaml_header(dashboard_header) dashboard_header
You'll notice that I've included a params
argument here, even though our
original YAML header above didn't have anything of the sort. One of the awesome
features of R Markdown is the ability to pass objects to your created dashboard
via the params field, no matter what the object is. This means that if you're
creating dashboards powered by expensive SQL queries or other processes that
take a long amount of time to complete, you can easily pull your data once,
use it to create your report, and then pass it as a parameter to your report
to actually create your graphics and other outputs. Similarly, we're able to
define and test our functions outside of the dynamic report and then pass them
in just like any other object -- our dashboard will be able to use them
without any trouble.
Now that we have our patterns in R, it's time to use them to generate our
dashboard. In each of our patterns, I've gone ahead and used placeholder values
in both the code and markdown to indicate where I want to use values instead.
We can then go ahead and use the heddle
function to replace those values
appropriately.
The heddle
function takes three key arguments -- our data, the pattern we're
working with, and the placeholder values to replace. If the data we pass is
a vector (like in a basic mutate
call), that last argument can just be all
the placeholders we want to replace -- heddlr will automatically assume you
want to replace it with the values in the vector. If we pass it a dataframe
instead, we need to name our arguments in the format "Placeholder" = Variable
-- which allows us to use multiple different variables to replace the
placeholders.
dashboard_body <- flights %>% distinct(origin, month) %>% arrange(month) %>% tidyr::nest(month = month) %>% mutate( sitepattern = heddle(origin, draft$site_pattern, "AIRCODE"), monthpattern = purrr::map(month, heddle, draft$month_pattern, "MONTHNUM" = month) ) dashboard_body
You'll notice that even though we made both our site and month patterns with
heddle
, "sitepattern" came out as a single text string, while "monthpattern"
is currently saved as a list. That brings us to our next function,
make_template
. This function can be used in a few different ways:
mutate
and purrr::map
, it can be used to flatten those
list vectors into text strings, which is necessary for exporting our
dashboardWe'll use the first two right now to get the majority of our dashboard into one object, which will form the bulk of our dashboard:
dashboard_body <- dashboard_body %>% mutate( monthpattern = purrr::map_chr(monthpattern, make_template), template = paste0(sitepattern, monthpattern) ) %>% make_template(template) substr(dashboard_body, 1, 40)
So now we have two template objects -- the YAML header we
created earlier, and the main dashboard body we just finished. We'll use the last
use of make_template
to combine the two of those -- note that make_template
can't combine separate objects when used in a pipe, due to some limitations
in how it determines which method it should use.
We're also going to add in the last major heddlr
function: export_template
.
By providing a template object and a file path to export_template
, you'll
wind up with a .Rmd document ready to be knit into its final output.
make_template(dashboard_header, dashboard_body) %>% export_template("flights_dashboard.Rmd")
We're then able to call rmarkdown::render
on our new .Rmd document from the
same document we used to generate it. Because we've already gone ahead and
defined our datasets and functions, we're able to knit the document with
parameters to speed up processing time -- this is particularly helpful if your
data is particularly large or coming from a SQL connection, so you only need
to import the data once. I included the parameters in our create_yaml_header
step, and will include them in this render call as well:
rmarkdown::render("flights_dashboard.Rmd", params = provide_parameters( flights, graph_flights_per_hour, graph_mean_delay, graph_sankey, graph_carriers ) )
This creates the flexdashboard we've been working towards from the start.
We've now fully automated the building and rendering of our dashboard, using about 100 lines of code to quickly piece together an R Markdown report which can be arbitrarily long, with sections defined by our dataset. This becomes extremely powerful when paired with changing datasets -- for instance, if we regularly needed to include or drop airports from the dashboard, or incorporate additional months -- or longer datasets that aren't practical to develop reports for by hand.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.