In mikemahoney218/heddlr: Dynamic R Markdown Document Generation

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This document serves as an introduction to using heddlr, a set of tools to help dynamically build R Markdown documents based on repeating sections, for somewhat more complicated reports. Through the course of this vignette, we'll be going over the core functions which make up heddlr and how they interact to help make report generation faster, with the end goal of creating this flexdashboard based on the nycflights13 dataset as our final product.

One of the fundamental concepts of this package is that heddlr is content agnostic -- that is, it should be able to cooperate with any and all R (and therefore, R Markdown) extensions. As such, making our dashboard is going to require some non-heddlr code to create our actual content. To start with, we'll load a few basic libraries:

library(heddlr)
library(dplyr) # For our data manipulation
library(ggplot2) # For graphing utilities
library(nycflights13) # Provides the flight dataset
library(ggalluvial) # Helps build one of our charts

You'll also need to have purrr, tidyr, and forcats installed to run this document completely -- however, we won't load them outright in this document.

heddlr aims to let you approach R Markdown documents like any other R script, being able to abstract away repeated sections into individual patterns the same way you'd normally abstract repeated code into functions. To do so, heddlr encourages users to split their R Markdown documents into pieces, separating their data import, wrangling, and function creation into a "generator" file while moving repeated report sections into various "pattern" files. This document itself serves as a generator for our example dashboard -- it does all the hard work of loading the dataset and creating functions, uses those to spin the dashboard together, and then renders the dashboard to create an output.

We've already handled our data import and wrangling just by loading the nycflights13 package, which means it's time to move onto building the functions which will be responsible for creating most of our dashboard. You'll notice that our target dashboard repeats the same four main components multiple times -- three graphs are repeated for each airport, while one graph is repeated for each month and each airport in the dataset. Our first step to reduce code duplication is to create the actual functions which will produce those graphics in our dashboard:

# The polar coordinate plot of flights per hour
graph_flights_per_hour <- function(origin_code) {
  flights %>%
    filter(origin == origin_code) %>%
    count(time_hour, hour) %>%
    group_by(hour) %>%
    summarise(n = mean(n)) %>%
    ggplot(aes(hour, n, fill = n)) +
    geom_col(color = "black") +
    scale_x_continuous(breaks = seq(2, 24, 2)) +
    scale_fill_viridis_c(begin = 0, end = 0.7, name = "") +
    theme_minimal() %+replace%
    theme(axis.text.y = element_blank()) +
    coord_polar() +
    labs(
      title = "Average number of flights per hour",
      subtitle = paste(origin_code, "Airport"),
      x = "",
      y = ""
    )
}
graph_flights_per_hour("EWR")

# The graph of average flight delays per day for a month
graph_mean_delay <- function(origin_code, month_number) {
  flights %>%
    filter(origin == origin_code &
      month == month_number) %>%
    mutate(date = lubridate::as_date(time_hour)) %>%
    group_by(date) %>%
    summarise(delay = mean(dep_delay, na.rm = T)) %>%
    ggplot(aes(date, delay)) +
    geom_col(color = "black") +
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.07))) +
    theme_classic() +
    labs(
      x = "Day of month",
      y = "Delay (minutes)",
      title = "Average flight departure delays per day",
      subtitle = paste(origin_code, "Airport,", month.name[[month_number]])
    )
}

graph_mean_delay("EWR", 1)

# The sankey/flow chart of where flights head from each airport
graph_sankey <- function(origin_code) {
  flights %>%
    count(origin, dest) %>%
    mutate(highlight = ifelse(origin == origin_code, 1, 0)) %>%
    ggplot(aes(axis1 = origin, axis2 = dest, y = n)) +
    geom_alluvium(aes(fill = highlight)) +
    geom_stratum(width = 1 / 10) +
    geom_label(stat = "stratum", infer.label = TRUE) +
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.07))) +
    theme_classic() %+replace%
    theme(
      legend.position = "none",
      axis.text = element_blank()
    ) +
    labs(
      x = "",
      y = "",
      title = paste("Flights out of", "EWR")
    )
}

graph_sankey("EWR")

# The barplot of delay by carrier
graph_carriers <- function(origin_code) {
  flights %>%
    filter(origin == origin_code) %>%
    group_by(carrier) %>%
    summarise(dep_delay = mean(dep_delay, na.rm = T)) %>%
    mutate(
      carrier = factor(carrier),
      carrier = forcats::fct_reorder(carrier, dep_delay)
    ) %>%
    ggplot(aes(carrier, dep_delay)) +
    geom_col() +
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.07))) +
    theme_classic() +
    labs(
      x = "Carrier",
      y = "Average departure delay (minutes)",
      title = "Mean departure delay by carrier",
      subtitle = paste("EWR", "Airport")
    )
}

graph_carriers("EWR")

So now we have the functions that will produce our actual content for the dashboard. Now we need to repeat them with the proper arguments that will produce the 45 different graphics included in our dashboard. Additionally, we'll have to include the markdown syntax that will properly create menus and tabs for our dashboard.

This is where we'll start using functions from heddlr. Imagine that the top of our dashboard looks something like the following:

---
title: 2013 NYC Flights Dashboard
author: Mike Mahoney
output:
  flexdashboard::flex_dashboard:
    vertical_layout: fill
    orientation: rows
---

# EWR {data-navmenu="Airport Overviews"}

##

###

.```r
graph_flights_per_hour("EWR")
.```

###
.```r
graph_sankey("EWR")
.```

###
.```r
graph_carriers("EWR")
.```

## {.tabset}

### January

.```r
graph_mean_delay("EWR", 1)
.```

{...and so on, for 11 more months and 2 more airports}

As far as heddlr is concerned, this dashboard is composed of three different sections:

The YAML header is the first section, as it's never repeated anywhere else in the document
Everything after the YAML header and up to the ## {.tabset} line is the second section, as this will be repeated once per airport
Everything else is the third section, which will be repeated once per month per airport

In heddlr terminology, each one of those repeated sections is what we'll call a pattern. We typically store those patterns in their own files -- that makes it a bit easier to reason about what each one needs to include and how they'll piece together. For this dashboard, we've saved the second two sections off as two pattern files -- we'll come back to the YAML header in a minute.

The file for the second section repeated once per airport is called flights_site_level.Rmd, and it looks like this:

# AIRCODE {data-navmenu="Airport Overviews"}

##

###

.```r
airportcode <- "AIRCODE"

graph_flights_per_hour(airportcode)
.```

###
.```r
graph_sankey(airportcode)
.```

###
.```r
graph_carriers(airportcode)
.```

## {.tabset}

Note that I've replaced the multiple "EWR" references with two "AIRCODE" placeholder -- this really doesn't matter, but feels cleaner to me.

The pattern for the monthly-repeated sections is called flights_month_level.Rmd, and it looks like this:

### `.r month.name[[MONTHNUM]]`

.```r
graph_mean_delay(airportcode, MONTHNUM)
.```

Note that here I've swapped the "EWR" reference with the airportcode variable that we'll create in flights_site_level.Rmd, and replaced our explicit January with a quick snippet of code that will print the correct month name when we replace it with an integer value.

We have a few ways we can then import those pattern files into our R session to work with. The first and simplest of these is import_pattern, which just takes as argument a file path:

pattern <- import_pattern("flights_site_level.Rmd")
pattern

Of course, we usually have more than one pattern to work with, and it can be nice to keep them all in one place. heddlr helps you do this as well via the import_draft function, which takes named filepaths and stores them in a list:

draft <- import_draft(
  "site_pattern" = "flights_site_level.Rmd",
  "month_pattern" = "flights_month_level.Rmd"
)
draft

There's one last function in heddlr to help us create pattern objects, named create_yaml_header. If we wanted, we can include a pattern in our import_draft call that contains the YAML needed to render our R Markdown document. However, create_yaml_header also provides us the option of creating that header natively in our generator document by passing it a list:

dashboard_header <- list(
  "title" = "2013 NYC Flights Dashboard",
  "author" = "Mike Mahoney",
  params = list(
    "flights" = "NA",
    "graph_flights_per_hour" = "NA",
    "graph_mean_delay" = "NA",
    "graph_sankey" = "NA",
    "graph_carriers" = "NA"
  ),
  "output" = list("flexdashboard::flex_dashboard" = list(
    "vertical_layout" = "fill",
    "orientation" = "rows"
  ))
)

dashboard_header <- create_yaml_header(dashboard_header)
dashboard_header

You'll notice that I've included a params argument here, even though our original YAML header above didn't have anything of the sort. One of the awesome features of R Markdown is the ability to pass objects to your created dashboard via the params field, no matter what the object is. This means that if you're creating dashboards powered by expensive SQL queries or other processes that take a long amount of time to complete, you can easily pull your data once, use it to create your report, and then pass it as a parameter to your report to actually create your graphics and other outputs. Similarly, we're able to define and test our functions outside of the dynamic report and then pass them in just like any other object -- our dashboard will be able to use them without any trouble.

Now that we have our patterns in R, it's time to use them to generate our dashboard. In each of our patterns, I've gone ahead and used placeholder values in both the code and markdown to indicate where I want to use values instead. We can then go ahead and use the heddle function to replace those values appropriately.

The heddle function takes three key arguments -- our data, the pattern we're working with, and the placeholder values to replace. If the data we pass is a vector (like in a basic mutate call), that last argument can just be all the placeholders we want to replace -- heddlr will automatically assume you want to replace it with the values in the vector. If we pass it a dataframe instead, we need to name our arguments in the format "Placeholder" = Variable -- which allows us to use multiple different variables to replace the placeholders.

dashboard_body <- flights %>%
  distinct(origin, month) %>%
  arrange(month) %>%
  tidyr::nest(month = month) %>%
  mutate(
    sitepattern = heddle(origin, draft$site_pattern, "AIRCODE"),
    monthpattern = purrr::map(month, heddle, draft$month_pattern, "MONTHNUM" = month)
  )

dashboard_body

You'll notice that even though we made both our site and month patterns with heddle, "sitepattern" came out as a single text string, while "monthpattern" is currently saved as a list. That brings us to our next function, make_template. This function can be used in a few different ways:

If used with mutate and purrr::map, it can be used to flatten those list vectors into text strings, which is necessary for exporting our dashboard
If used after a pipe step, it collapses a template vector into a single string
If used on its own, it can combine several different template objects into one

We'll use the first two right now to get the majority of our dashboard into one object, which will form the bulk of our dashboard:

dashboard_body <- dashboard_body %>%
  mutate(
    monthpattern = purrr::map_chr(monthpattern, make_template),
    template = paste0(sitepattern, monthpattern)
  ) %>%
  make_template(template)

substr(dashboard_body, 1, 40)

So now we have two template objects -- the YAML header we created earlier, and the main dashboard body we just finished. We'll use the last use of make_template to combine the two of those -- note that make_template can't combine separate objects when used in a pipe, due to some limitations in how it determines which method it should use.

We're also going to add in the last major heddlr function: export_template. By providing a template object and a file path to export_template, you'll wind up with a .Rmd document ready to be knit into its final output.

make_template(dashboard_header, dashboard_body) %>%
  export_template("flights_dashboard.Rmd")

We're then able to call rmarkdown::render on our new .Rmd document from the same document we used to generate it. Because we've already gone ahead and defined our datasets and functions, we're able to knit the document with parameters to speed up processing time -- this is particularly helpful if your data is particularly large or coming from a SQL connection, so you only need to import the data once. I included the parameters in our create_yaml_header step, and will include them in this render call as well:

rmarkdown::render("flights_dashboard.Rmd",
  params = provide_parameters(
    flights,
    graph_flights_per_hour,
    graph_mean_delay,
    graph_sankey,
    graph_carriers
  )
)

This creates the flexdashboard we've been working towards from the start.

We've now fully automated the building and rendering of our dashboard, using about 100 lines of code to quickly piece together an R Markdown report which can be arbitrarily long, with sections defined by our dataset. This becomes extremely powerful when paired with changing datasets -- for instance, if we regularly needed to include or drop airports from the dashboard, or incorporate additional months -- or longer datasets that aren't practical to develop reports for by hand.

mikemahoney218/heddlr documentation built on June 20, 2021, 10:22 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com