\proglang{R} Markdown [@baumer2014, @rmarkdownbook] facilitates the construction of computationally reproducible documents by allowing authors to insert \proglang{R} code for data processing, exploration, analysis, table-making, and visualization directly into structured, electronic documents. The resulting documents are made up of these chunks of \proglang{R} code, which we will refer to as computational components since they are generated by computational means, as well as narrative components, which (in scientific writing) is prose intended to contextualize computational components, provide background, define goals, establish themes, and convey results. These documents are then used to render output documents, for users to read in the form of .html, .pdf, .doc, or other formats using the \pkg{knitr} package [@knitrbook].
The integration of narrative and computational components was originally identified as "Literate Programming" by @knuth1984 and software tools, like Sweave [@leisch2002], have supported this functionality for almost two decades. However, more recently, \proglang{R} Markdown has become particularly popular with its success likely being driven by two factors. The first is the relative ease with which these documents can be constructed. While \LaTeX \ is more expressive, it is relatively technical and requires an investment in time to become proficient. By contrast \proglang{R} Markdown documents are easier to create and format and, when the document is used to create \LaTeX, formatting can be passed through to the underlying .tex file. The second factor driving adoption is likely its support for creating modifiable documents, namely Microsoft Word documents. Researchers and analysts, especially those creating applied statistical analyses, often collaborate with domain experts with less technical knowledge. In these cases, the analyst focuses on creating the computational components and narrative components related to results and interpretation. After this initial document is created, the domain expert is free to develop narrative components directly in the document without needing to go through the analyst.
Since computational components are, by definition, computationally derived objects and \proglang{R} Markdown is a well-defined standard, it is possible to programmatically create \proglang{R} Markdown documents with computational components, which is the focus of this paper. Generating documents in this manner has two appealing characteristics. First, it allows us to distinguish the presentation of analytical results from other steps in a data science or data processing pipeline. Other steps including cleaning and analysis often require their own environment and configuration with requirements very different than the computational needs of creating a presentation. By separating these components each can be developed independently. At the same time, by specifying a contract for the output of those objects, we can establish a consistent means by which processed data can be passed to systems for presenting those data in a structured way. The second reason for programmatic creation of R Markdown documents is convenience. In collaborative environments, especially in the early stages, large numbers of graphs and tables are generated and discussed. By collecting these artifacts and structuring them consistently, we can quickly iterate upon and restructure the resulting documents to more clearly present the data without needing to spend time on the creation of the presentation document.
The \pkg{listdown} package provides functions to programmatically create \proglang{R} Markdown files from named lists. It is intended for data analysis pipelines where the presentation of the results is separated from their creation. For this use case, a data processing (or analysis) is performed and the results are provided in a single named list, organized hierarchically. List element names denote sections, subsections, subsubsection, etc. and the list elements contain the data structure to be presented including graphs and tables. The package has native support for \pkg{workflowr} [@blischak2019], pdf, word, or html document along with functions allowing a user to easily extend to other types of supported documents. The goal of the package is to create a documents with all tables and visualization that will appear (computational components). This serves as a starting point from which a user can organize outputs, describe a study, discuss results, and provide conclusions (narrative components).
\pkg{listdown} therefore provides a reproducible means for producing a document with specified computational components. It is most compatible with data analysis pipelines where the data format is fixed but the analyses are either being updated, which may affect narrative components including the result discussion and conclusion, or where the experiment is different, which affects all narrative components, but the data format and processing is consistent. An example of the former is provided later in this paper.
One area where we have found \pkg{listdown} is particularly useful is in the reporting and research of clinical trial data. These collaborations tend to be between (bio)statisticians and clinicians either analyzing past trial data to formulate a new trial or in trial monitoring where trial telemetry (enrollment, responses, etc.) is reported and initial analyses are conveyed to a clinician. The associated presentations require very little context - clinicians often have as good an understanding of the data collected as that of the statistician's - often eliminating or significantly reducing the need for narrative components. At the same time, a large number of hierarchical, heterogeneous artifacts (tables and multiple types of plots) need to be conveyed thereby making the manual creation of \proglang{R} Markdown documents inconvenient.
In this case, data presentation can be fixed across trials. This is especially true in the initial stages, which focus on patient demographics and enrollment. This approach has made it convenient for our group to quickly generate standardize and complete reports for multiple trials concurrently. To date, we have used listdown to report on five clinical trials, with another two currently in process. Results are disseminated using the \pkg{workflowr} package, usually with nine tabs conveying aspects of the data from collection through several different analyses, and each tab containing approximately five to thirty tables, plots, or other artifacts including \pkg{trelliscopejs} [@trelliscopejs] environments which may hold hundreds of graphs. By generating many presentation artifacts we are able to address data-drive questions and issues during collaborative sessions and by carefully structuring these elements we allowing all members to participate in the process.
The \pkg{listdown} package itself is relatively simple with 10 distinct methods that can be easily incorporated into existing analysis pipelines for automatically creating documents that can be used for data exploration and reviewing analysis results as well as a starting point for a more formal write up. These methods include:
\begin{itemize} \item{\bf as_ld_yml() }{- turn a computational component list into YAML with class information.} \item{\bf ld_cc_dendro() }{- create a dendrogram from a list of computational components.} \item{\bf ld_chunk_opts() }{- apply chunk options to a presentation object.} \item{\bf ld_ioslides_header() }{- create an ioslides presentation header.} \item{\bf ld_make_chunks() }{- write a listdown object to a string.} \item{\bf ld_rmarkdown_header() }{- create an R Markdown header.} \item{\bf ld_workflowr_header() }{- create a worflowr header.} \item{\bf ld_write_file() }{- write to and R Markdown file.} \item{\bf listdown() }{- create a listdown object to create an \proglang{R} Markdown document.} \item{\bf print.listdown()}{- print the listdown options for \proglang{R} Markdown document creation.} \end{itemize}
The rest of this paper is structured as follows. The next section goes over basic usage and commentary. It is meant to convey the approach used by the package and shows how to describe an output document using \pkg{listdown}, create a document, and change how the presentation of computational components can be specialized using \pkg{listdown} decorators. With the user accustomed to the package's basic usage, Section 3 describes the design of the package. Section 4 goes over advanced usage of the package including adding initialization code to and outputted document as well as how to control chunk-level options. Section 5 provides a simplified case study of how the package is currently being used in clinical trial reporting. Section 6 concludes the paper with a few final remarks on the general types of applications where \pkg{listdown} has been shown effective.
Suppose we have just completed and analysis and have collected all of the results into a list where the list elements are roughly in the order we would like to present them in a document. It may be noted that this is not always how computational components derived from data analyses are collated. Often individual components are stored in multiple locations on a single machine or across machines. However, it is important to realize that even for analyses on large-scale data, the digital artifacts to be presented are relatively small. Centralizing them makes it easier to access them, since they don't need to be found in multiple locations. Also, storing them as a list provides a hierarchical structure that translates directly to a document as we will see below.
As a starting point, we will consider the a list of visualizations from the Anscombe data set below. The list is composed of four \pkg{ggplot2} [@wickham2016] elements (named Linear, Non Linear, Outlier Vertical, and Outlier Horizontal) each containing a scatter plot from the Anscombe Quartet - made available in the \pkg{datasets} package [@R]. From the \code{computational_components} list, we would like to create a document with four sections with names corresponding to the list names, each containing their respective visualizations. The structure of a document derived from the \code{computational_components} list can be visualized using the \code{ld_cc_dendro()} function, and its output is below.
library("ggplot2") library("listdown") data(anscombe) computational_components <- list( Linear = ggplot(anscombe, aes(x = x1, y = y1)) + geom_point(), `Non Linear` = ggplot(anscombe, aes(x = x2, y = y2)) + geom_point(), `Outlier Vertical`= ggplot(anscombe, aes(x = x3, y = y3)) + geom_point(), `Outlier Horizontal` = ggplot(anscombe, aes(x = x4, y = y4)) + geom_point()) ld_cc_dendro(computational_components)
Creating a document whose structure and content are described \code{computational_components} requires two steps. First, we will create a \code{listdown} object specifying how the \newline \code{computational_components} object will be loaded into the document, which libraries and code needs to be included, and how the list elements will be presented in the output R markdown document. A human-readable \code{print} function is included in the package and is the default output of the object. It should be noted that the output shows options that will be described and illustrated later.
saveRDS(computational_components, "comp-comp.rds") ld <- listdown(load_cc_expr = readRDS("comp-comp.rds"), package = "ggplot2") ld
The \code{ld} object, along with the computational components in the \code{comp-comp.rds} file are sufficient to to create the sections, subsections, and \proglang{R} chunks of a document. The only other thing required to create the document is the header. The \pkg{listdown} package currently supports regular R Markdown and \pkg{workflowr} as \code{yml} objects from the \pkg{yaml} package [@yaml]. These objects are stored as named lists in \proglang{R} and are easily modified to accommodate document parameters. A complete document can then be written to the console using the code shown below. It could easily be written to a file for rendering using the \code{ld_write_file()} function, for example.
ld_write_file(ld_rmarkdown_header("Anscombe's Quartet", author = "Francis Anscombe", date = "1973"), ld, "anscome-example.rmd")
The \code{listdown()} function provides document-wide R chunk options for displaying computational components. The chunk options are exactly the same as those in the R Markdown document and can be used to tailor the default presentation for a variety of needs. The complete set of options can be found in the R Markdown Reference Guide [@rmarkdownref]. As a concrete example, the code used to create present the plots could be hidden in the output document using the following code.
ld <- listdown(load_cc_expr = readRDS("comp-comp.rds"), package = "ggplot2", echo = FALSE) ld_make_chunks(ld)[1:7]
The first example is simple in part because the \code{ggplot} objects both contains the data we want to display and, at the same time, provides the mechanism for presenting them - rendering them in a visualization However, this is not always the case. The objects being stored in the list of computational components may not translate directly to the presentation in a document. In these cases, a function is needed that takes the list component and returns an object to be displayed. For example, suppose that, along with showing graphs from the Anscombe Quartet, we would like to include the data themselves. We could add the data to the \code{computational_components} list and then create the document with:
computational_components$Data <- anscombe saveRDS(computational_components, "comp-comp.rds") ld_make_chunks(ld)[32:36]
In this case, the \pkg{listdown} package will show the entire data set as is the default specified. However, suppose we do not want to show the entire data set in the document. This is common, especially when the data set is large and requires too much vertical space in the outputted document resulting in too much or irrelevant data being shown. Instead, we would like to output to an html document where the data is shown in a \code{datatable} thereby controlling the amount of real-estate needed to present the data and, at the same time, providing the user with interactivity to sort and search the data set.
In \pkg{listdown}, a function or method that implements the presentation of a computational component is referred to as a decorator since if follows the classic decorator pattern described in @gamma1995. A decorator takes the element that will be presented as an argument and returns an object for presentation in the output directory. A decorator is specified using the \code{decorator} parameter of the \code{listdown()} function using a named list where the name corresponds to the type and the element correspond to the function or method that will decorate an object of that type. For example, the \code{anscombe} data set can be decorated with the \code{DT::datatable()} function [@xie2020] as:
ld <- listdown(load_cc_expr = readRDS("comp-comp.rds"), package = c("ggplot2", "DT"), decorator = list(data.frame = datatable)) ld_make_chunks(ld)[33:37]
List names in the \code{decorator} argument provide a key to which a function or method is mapped. The underlying decorator resolution is implemented for a given computational component by going through decorator names sequentially to see if the component inherits from the name using the \code{inherits()} function. The function or method is selected from the corresponding name which the element first inherits from. This means that when customizing the presentation of objects that inherit from a common class, the more abstract classes should appear at the right-end of the list. This will ensure that specialized classes will be encountered first in the resolution process. It should be noted that an object's type is first checked against the decorator name list and then checked to see if it is a list. This allows a user to both decorate a list and retain \code{"list"} in its class attributes.
A separate argument, \code{default_decorator}, allows the user to specify the default decorator for an object whose type does not appear in the \code{decorator} list. This allows the user to specify any class name for the decorator and avoids a potential type name collision with a default decorator whose name is determined by convention. By default, this argument is set to \code{identity} but it can be use to not display a computational component by default if the argument is set to \code{NULL}.
It should be noted that it is not possible to decorate a list and an attempt to do so results in an error. This is because, when generating a document, the ld_make_chunks()
function recursively descends into the list of computational components. The decision to descend is made based on the type of the element being visited. If it is a list, then it descends otherwise it presents the object. However, a list can't designate both an object to hold elements for presentation and an object to be presented. To present arbitrary list elements, including lists, a user may add class information to a list element and a corresponding decorator.
A \pkg{listdown} object specifies the location of a list of computational components and options for presenting those components in an R Markdown document. The list is a hierarchical data structure that also provides the structure of the outputted document. A corresponding document has two sections "Iris" and "Sepal.Length". The latter has three subsections "Sepal.Width", "Petal.Length", and "Colored". The "Colored" subsection has two subsubsections, "Sepal.Width" and "Petal.Length". The structure can once again be seen using the \code{ld_cc_dendro()} function.
comp_comp2 <- list( Iris = iris, Sepal.Length = list( Sepal.Width = ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(), Petal.Length = ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(), Colored = list( Sepal.Width = ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(), Petal.Length = ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point()))) ld_cc_dendro(comp_comp2)
Both the \code{ld_cc_dendro()} and \code{ld_make_chunks()} functions work by recursively descending the computational components list depth-first. If the list containing and element has a name, it is written to the output as a section, subsection, subsubsection, etc. to a return string. If the visited list element is itself a list, then the same procedure is called on the child list through a recursive call. If the element is not a list, then it is outputted inside an R Markdown chunk in the return string using the appropriate decorator.
The \code{listdown()} function facilitates the insertion of setup and initialization code through the \code{setup_expr} and \code{init_expr} arguments. If an argument is provided to the \code{setup_expr}, then the first code chunk of the document will have the specified code inserted. This code chunk is named "setup" and the include parameter is set to \code{FALSE}. When the \code{init_expr} argument is specified, code is inserted immediately after the libraries are loaded in the R Markdown document. In general, it is suggested that the number of initial expressions be kept small so that the \proglang{R} Markdown document is easy to read. If a large number of functions are required by the target \proglang{R} Markdown document then they can be put into a file and sourced using the initial expression. As an example, suppose we are creating an html document and presenting data using the \code{datatable()} function. However, we do not want to include the search capabilities provided. This can be easily accomplished by creating a new function, \code{datatable_no_search()}, created using the \code{partial()} function \cite{purrr} to partially apply \code{list(dom = 't')} to the \code{options} argument of \code{datatable}.
saveRDS(comp_comp2, "comp-comp2.rds") ld <- listdown(load_cc_expr = readRDS("comp-comp2.rds"), package = c("ggplot2", "DT", "purrr"), decorator = list(ggplot = identity, data.frame = datatable_no_search), setup_expr = knitr::opts_chunk$set(echo = FALSE), init_expr = { datatable_no_search <- partial(datatable, options = list(dom = 't')) }) ld_make_chunks(ld)[2:14]
The \pkg{listdown} package supports also supports capabilities to further customize the presentation by specifying \proglang{R} code chunk options in the \proglang{R} Markdown document in two distinct ways. The first is used when the options we would like to specify is tied to type of the object being presented. This can be though of as a chunk-option decorator. The second is use for changing the options for an individual chunk in an ad hoc manner.
With three different modes of chunk customization it should be noted that the increasing priority of the chunk options specification is document-wide, decorator-wide, and ad hoc. That is, decorator-wide chunk options take priority over document-wide chunk options and ad-hod options take priority over decorator-wide options. In addition, it should be noted that the use of the lowest priority scheme that accomplishes the presentation goals is preferred because it lends itself to greater code and maintenance efficiency.
The document-wide chunk option specification provides the default chunk options for output documents generated using \pkg{listdown}. However, the presentation of a data object often varies by type. For example, we may want to specify the height and width of a graph, but not a table. This is accomplished in the \pkg{listdown} package when a \code{listdown} object is created using the \code{decorator_chunk_opts} option in the ]code{listdown()} function. For example, associating all \code{ggplot} objects with \proglang{R} chunks having a width of 100 and a height of 200 can be accomplished with the following code and it can be seen that only chunk options associated with a plot are modified.
ld <- listdown(load_cc_expr = readRDS("comp-comp2.rds"), package = c("ggplot2", "DT", "purrr"), decorator_chunk_opts = list(ggplot = list(fig.width = 100, fig.height = 200)), init_expr = { datatable_no_search <- partial(datatable, options = list(dom = 't')) }, echo = FALSE) ld_make_chunks(ld)[c(12:16, 19:24)]
Along with providing decorator-wide chunk options, it is also possible to control individual chunk options. The capability is distinct from the document-wide and decorator-wide specification of options in that it must be applied to the computational component list element whose associated options will be modified, rather than the \code{listdown} object. This is because the \code{listdown} only specifies how classes of objects should be presented. To modify the chunk options associated with a specific list element the list element is provided with a set of attributes that can be queried by the \code{ld_chunk_opts()} function as the output document is being generated. Because of the ad hoc nature of this capability, its use is discouraged. A better solution, that maintains the behavior is to add class information to the list element and specify decorator-wide chunk options for the new class. This maintains the separation of the computational component list, which maintains the document structure and data for presentation from the specification of how the document will be created and rendered.
comp_comp2$Iris <- ld_chunk_opts(comp_comp2$Iris, echo = TRUE) saveRDS(comp_comp2, "comp-comp2.rds") ld_make_chunks(ld)[12:16]
Since a base plot does not encapsulate the state and ability to present the plot, like a ggplot
object, for example it is not possible to assign a base graphic to a list element and present it. Instead, it is recommended that the corresponding data is held as a list element, with base graphic options, and that that element is given a class and a corresponding decorator. Using this approach the decorator generates the visualization though a call to base graphics when the output document is rendered.
\label{sect:simple-example}
As mentioned before, we have found the \pkg{listdown} package particularly helpful for reporting the results of clinical trials thereby creating a basis for discussion and collaboration between (bio)statisticians and the clinicians running the trials. For this use case, the context and data collection procedures are well-understood and as a result very few narrative components are needed. It is also the case that the modes of presentation (tables, scatterplots, survival plots, consort diagrams, etc.) are standardized. The goal in presenting the trial characteristics is to identify problems in the data, monitor trial enrollment and response, quantify known relationships among the data, and test hypotheses about a therapy's efficacy.
In practice we generally separate the data cleaning, exploration, analysis, monitoring, and presentation components. The computational component list has tens of elements with thousands of visualizations. These facts, coupled with the privacy constraints make a complete example difficult for the purposes of this paper. So, below we provide a simple example of the types of documents we provide using the \code{gtsummary::trial} data set. While it is not complete, it does convey they types of reports we are currently generating for completed and ongoing clinical trials.
The code creates a computational component list containing a table of the patient characteristics along with survival plots by overall survival, survival by stage, and survival by grade. The table is an element of a named list called "Table 1" and then survival plots are elements of named lists indicating the conditioning variable. The structure can be seen in the dendrogram below.
library("gtsummary") library("dplyr") library("survival") library("survminer") library("rmarkdown") make_surv_cc <- function(trial, treat, surv_cond_chars) { table_1 <- trial %>% tbl_summary(by = all_of(treat)) %>% gtsummary::as_flex_table() scs <- lapply(c("1", surv_cond_chars), function(sc) { sprintf("Surv(ttdeath, death) ~ %s + %s", treat, sc) %>% as.formula() %>% surv_fit(trial) %>% ggsurvplot() }) names(scs) <- c("Overall", tools::toTitleCase(surv_cond_chars)) list(`Table 1` = table_1, `Survival Plots` = scs) } surv_cc <- make_surv_cc(trial, treat = "trt", surv_cond_chars = c("stage", "grade")) ld_cc_dendro(surv_cc)
As shown before, the report is created by saving the computational components, creating a \code{listdown} object, writing the \proglang{R} Markdown document, and rendering it. The resulting document, trial-report.html, can then be placed in a shared space where it can be viewed and interpreted by stakeholders in the clinical trial. The \proglang{R} Markdown document created by this code is shown in supplementary materials.
class(surv_cc$`Survival Plots`$Overall) <- class(surv_cc$`Survival Plots`$Stage) <- class(surv_cc$`Survival Plots`$Grade) <- "list" names(surv_cc$`Survival Plots`) <- paste(names(surv_cc$`Survival Plots`), "{.tabset}") names(surv_cc$`Survival Plots`$`Overall {.tabset}`) <- names(surv_cc$`Survival Plots`$`Stage {.tabset}`) <- names(surv_cc$`Survival Plots`$`Grade {.tabset}`) <- c("Plot", "Data", "Table") saveRDS(surv_cc, "surv-cc.rds") ld_surv <- listdown(load_cc_expr = readRDS("surv-cc.rds"), package = c("gtsummary", "flextable", "DT", "ggplot2"), decorator_chunk_opts = list(gg = list(fig.width = 8, fig.height = 6)), decorator = list(data.frame = datatable), echo = FALSE, message = FALSE, warning = FALSE, fig.width = 7, fig.height = 4.5) writeLines( paste(c( as.character(ld_rmarkdown_header("Simple Trial Report")), ld_make_chunks(ld_surv))), "trial-report.rmd") render("trial-report.rmd", quiet = TRUE) browseURL("trial-report.html")
While the programmatic generation of reproducible documents has appealing qualities, it also has fundamental limitations that should be kept in mind when deciding if tools like \pkg{listdown} should be employed. First and foremost, without narrative components a document has very little context. Quantitative analyses require research questions, hypotheses, reviews, interpretations, and conclusions. Computational components are necessary but generally not sufficient for constructing an analysis. This means that if narrative components must be conveyed in a document, then \pkg{listdown} may make the generation of their presentation more convenient. Narrative components can even by created in \pkg{listdown} by including \code{character} elements with a chunk option decorator setting \code{results = "as.is"}. However, it does not relieve the burden of the author to create prose developing a narrative.
Second, it is difficult if not impossible to construct computational components for an arbitrary analyses. Analyses themselves have context and are built with a set of assumptions and goals. Our experience shows that \pkg{listdown} is easiest to used for a fixed data format. This means a standard set of table and visualizations for similarly formatted data. In particular, data that is periodically updated, without changes to the format, is a case that is particularly amenable to document generation reuse.
Keeping these limitations in mind, \pkg{listdown} can be used to effectively reduce the the difficulty of generating documents in a variety of contexts and fits readily in data processing and analysis pipelines.
\bibliography{../references}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.