knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette serves as a basic introduction to the heddlr
package, a set of
utilities to make it easier to write R Markdown documents with sections that
repeat or which might need to add or remove sections based on an underlying
data source. In order to demonstrate the essentials of how the package works,
let's imagine we have a super cool R Markdown document, which looks something
like this:
--- title: "My cool report!" author: "Captain heddlr" output: html_document --- # Let's talk about irises! ## Iris setosa This species of flower is great! It has a mean sepal length of `.r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and a mean sepal width of `.r mean(iris[iris$Species == "setosa", "Sepal.Width"])`. That looks like this on a graph! .```r iris %>% filter(Species == "setosa") %>% ggplot(aes(Sepal.Length, Sepal.Width)) + geom_point() .``` ## Iris virginica This species of flower is great! It has a mean sepal length of `.r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and a mean sepal width of `.r mean(iris[iris$Species == "virginica", "Sepal.Width"])`. That looks like this on a graph! .```r iris %>% filter(Species == "virginica") %>% ggplot(aes(Sepal.Length, Sepal.Width)) + geom_point() .```
This is a great report, and it probably didn't take that long to create. However, one day Joe down the hall points out that there are actually more types of irises than the ones you're always talking about - and your database already has information about one called versicolor!
If you wanted, you could go ahead and copy and paste the species section again, making sure to change all the species names to versicolor. However, there's some level of risk associated with that -- you can wind up with errors in your reporting if you miss replacing a value. More importantly, though, is that copying and pasting by hand doesn't scale to reports which are put out often and need multiple sections added or removed. It can save a lot of time and energy to instead automate that task away.
That's the motivation behind heddlr
^[It's a play on the
loom component, since we're trying to
automate the Sweave/knitr process and someone already took the name loomr
. ]:
to reduce that repetitive work and, as a side effect, simplify your code. To do
so, heddlr
looks at reports as a collection of components, which we'll call
patterns. As it happens, our super cool report above is made up of two patterns --
first, the setup material:
--- title: "My cool report!" author: "Captain heddlr" output: html_document --- .```r library(dplyr) library(ggplot2) .``` # Let's talk about irises!
And secondly the species-specific section, which gets repeated for each species in our dataset -- I've swapped the specific name out for a placeholder value:
## Iris SPECIES_NAME This species of flower is great! It has a mean sepal length of `.r mean(iris[iris$Species == "SPECIES_NAME", "Sepal.Length"])`, and a mean sepal width of `.r mean(iris[iris$Species == "SPECIES_NAME", "Sepal.Width"])`. That looks like this on a graph! .```r iris %>% filter(Species == "SPECIES_NAME") %>% ggplot(aes(Sepal.Length, Sepal.Width)) + geom_point() .```
When using heddlr
, we'll usually go ahead and save each of those patterns in
their own files -- for our example report, we'll name those files
setup_pattern.Rmd
and species_pattern.Rmd
respectively^[In order for the
vignettes for this package to build correctly, I haven't actually saved these
files off separately -- if you look at the source code for this vignette,
you'll notice I'm not actually using the heddlr
functions, but rather using
some workarounds to get the same exact results. ].
We then have a few ways we can import them into our R session. Let's load
heddlr
and then walk through them:
library(heddlr)
The first and most straightforward method is to use heddlr::import_pattern()
,
which does more or less what you'd expect and imports a single pattern into a
single R object.
setup_pattern <- "---\ntitle: \"My cool report!\"\nauthor: \"Captain heddlr\"\noutput: html_document\n---\n\n```r\nlibrary(dplyr)\nlibrary(ggplot2)\n```\n\n# Let's talk about irises!\n\n" species_pattern <- "## Iris SPECIES_NAME\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == \"SPECIES_NAME\", \"Sepal.Length\"])`, and \nmean sepal width of `r mean(iris[iris$Species == \"SPECIES_NAME\", \"Sepal.Width\"])`. \nThat looks like this on a graph!\n\n```r\niris %>%\n filter(Species == \"SPECIES_NAME\") %>%\n ggplot(aes(Sepal.Length, Sepal.Width)) + \n geom_point()\n```\n"
# These can be any sort of plaintext file -- I tend to save them as .Rmd, # so that I can see code highlighting in R Studio with them, but any extension # should work fine setup_pattern <- import_pattern("setup_pattern.Rmd") species_pattern <- import_pattern("species_pattern.Rmd")
This gives us objects that contain strings like this:
setup_pattern
However, it can be helpful in reports with multiple patterns to store
everything in one object, just to have fewer things floating around your
top-level environment. heddlr
provides heddlr::import_draft()
for this
purpose, wrapping an lapply
call which will return a single list object
holding all of your patterns:
iris_draft <- import_draft( "setup_pattern" = "setup_pattern.Rmd", "species_pattern" = "species_pattern.Rmd" )
iris_draft <- list( "setup_pattern" = setup_pattern, "species_pattern" = species_pattern )
iris_draft
Now that we've got our patterns into R, it's time to start working with them.
To demonstrate how we do that with heddlr
, we first need to load a few
libraries:
library(dplyr) library(tidyr) library(purrr)
Our first function that we'll use to work with patterns is heddlr::heddle()
.
On the most basic level, this is the function that will replace the
placeholders in our patterns with our data. If we have a vector containing
the values we want to use for each pattern, we're able to use this function
as follows and get a vector in return:
# heddle takes three arguments: data, pattern, placeholder to replace heddle(unique(iris$Species), "This is a pattern - CODE ", "CODE")
In the context of our example document, this function call would look like this:
heddle(unique(iris$Species), iris_draft$species_pattern, "SPECIES_NAME")[[1]]
It isn't super important, but I think of the outputs from heddle()
as being
components of a larger template, and will be using similar terminology
from here on out.
If we wanted to instead use heddle()
as part of a magrittr pipeline, we can
create new dataframe columns using dplyr::mutate()
:
iris %>% distinct(Species) %>% # exact same pattern of arguments: data, pattern, placeholder to replace mutate(component = heddle(Species, iris_draft$species_pattern, "SPECIES_NAME"))
We can also use heddle()
as its own step in a pipeline. When we pass
heddle()
a dataframe like this, instead of a vector, we have to specify which
column we want to replace the placeholder with:
iris %>% distinct(Species) %>% # the data argument is provided by %>% # so we just provide the pattern and placeholder # (format "PLACEHOLDER" = Variable) heddle("This is a pattern - CODE ", "CODE" = Species)
An advantage of this method is we can now replace multiple placeholders with our data in a single function call:
iris %>% distinct(Species) %>% heddle("This is a pattern - CODE ", "CODE" = Species, "This" = Species)
One last way we can use heddle()
is to use purrr::map()
to apply it to
nested columns made via tidyr::nest()
. To do so, we just provide arguments
in the same way as above:
iris %>% nest(nested = Species) %>% mutate(component = map(nested, heddle, "This is a pattern - CODE ", "CODE" = Species )) %>% head(2)
This is also the supported way to change multiple placeholder values while
saving a component as a column in a dataframe via mutate -- nest the columns
you're using to replace placeholders with and then use purrr
to replace the
placeholders in one step.
You'll notice that the output of this method is another list column, not the
same string column we're used to seeing as an output. Those familiar with
purrr:map()
might know that we can get a character output from many
dataframes with purrr::map_chr()
-- however, this doesn't work with
dataframes where you'll get more than one output from the map()
call:
iris %>% nest(nested = Species) %>% mutate(component = map_chr(nested, heddle, "This is a pattern - CODE ", "CODE" = Species )) # > Error: Result 102 must be a single string, not a character vector of length 2
Instead, we can turn to our next function, heddlr::make_template()
. There
are a few different ways that we can use this function. For our current
situation, for instance, we can use purrr::map_chr()
to apply
make_template()
to our new component list column, transforming it into the
normal set of strings we're used to:
iris %>% nest(nested = Species) %>% mutate( component = map(nested, heddle, "This is a pattern - CODE ", "CODE" = Species), component = map_chr(component, make_template) ) %>% head(2)
In addition to this, there are two other contexts we can use make_template()
.
If we pass it a dataframe and a vector, it will collapse that vector into a
single string:
iris %>% nest(nested = Species) %>% mutate( component = map(nested, heddle, "This is a pattern - CODE ", "CODE" = Species ), component = map_chr(component, make_template) ) %>% head(2) %>% make_template(component)
If instead the first argument we pass it is a vector, it will combine all the vectors you provide it into a single string:
make_template("Part one, ", "part two")
In the context of our example report, these steps would look something like this:
species_template <- iris %>% distinct(Species) %>% mutate(component = heddle( Species, iris_draft$species_pattern, "SPECIES_NAME" )) %>% make_template(component) report_template <- make_template(iris_draft$setup_pattern, species_template)
I refer to these output strings as templates, which are the second to last
step in the heddlr
pipeline. However, in order to actually create our
reports, we need to export these templates to .Rmd files. In order to do that,
heddlr
provides a helpful function, aptly named heddlr::export_template()
.
Normally, export_template
takes two arguments -- the template to write out,
and the file to write it out to. Here, I'm going to tell it to print to
stdout()
instead -- that will just output here what our sample report
would look like:
suppressWarnings(export_template(report_template, stdout()))
And there we have it! Repeating the above steps will always generate sections for each flower included in your dataset, whether or not Joe down the hall remembered to tell you about it.
To recap, the essential steps to make our report can be summed up as follows:
import_pattern()
or
import_draft()
heddle()
make_template()
export_template()
I'll usually save these steps into a "generator" file, which I can then run
every time I need to regenerate my report. I'll also often include a step in
that file that calls rmarkdown::render()
on the generated report file, so
that I can just run a single script in order to remake and rebuild my entire
report. Obviously, this can be something of an overkill for reports as simple
as our example here, but it can be a huge time saver for more complex reports
that have more repeating sections, built more often, or are based on more
dynamic data sources that require maintenance to add or remove components
between builds. It also can help make your code
DRYer^[Don't repeat yourself]
so you can make edits in a single place and see them replicated across your
entire report.
If you want to see this process applied to a somewhat more complicated example, check out how we can make this dashboard via this article.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.