Scope

This vignette demonstrates how you can start a report generation in your R code by creating the anrep class, and then use the methods of this class to add both structural and data elements to your report.

Using the API is very similar to using a typical logging library to emit some information from different places in your code. Only in this case, you are emitting rich content such as plots and tables, as well as organizing output into a hierarchical structure.

The report hierarchy is expressed through:

  1. generation of multi-level section numbering
  2. creation of multiple linked documents (called "subreports" in the anrep documentation).

Overall, the feature set in the anrepr package allows easily equipping any arbitrarily complex analysis pipeline with a graphical reporting facility that emits report elements within functions, loops or conditional branches.

The typical final output is an auto-generated static Web site that contains linked HTML pages, embedded and stand-alone versions of the plots as well as saved data files. The Web pages will contain dynamic Java Script elements if your code decides to report the types of R objects which implement the htmlwidgets interface. The generated Web site directory is portable - it can be copied to other locations or computers and viewed directly from disk, or it can be deployed to a Web server.

The reasons for creating this package

Popular R frameworks like Sweave and RMarkdown / Knitr packages greatly facilitate reproducible research and graphical report generation by embedding analysis code into the documents written in a structured markup languages such as LaTex, Markdown or HTML. When the document is "rendered" by Knitr, the code chunks written in R are replaced by the results of their evaluation. A large ecosystem of tools have been developed in the R community for automatically formatting various analysis objects into the markup representation.

One constraint imposed by such "document-driven" reporting approach is that the flow of the analysis is controlled by the linear structure of the template document. Generation of the reporting output from the embedded code will normally proceed from the top to the bottom of the document without branching or looping.

Recognizing a frequent need for a more flexible control over the flow of the report generation, knitr introduces both conditional evaluation tags and references between the chunks of code in the document. However, applying this advanced functionality imposes on the users an additional layer of programming that has to be done in a limited domain specific language (DSL) implicitly defined by the available knitr tags. The users have to learn this DSL and manage to express with it their reporting workflows. This has to happen despite the fact that they are already writing their code chunks in the powerful general purpose programming language such as R.

The situation is similar with the very popular Python-centric literate programming framework Jupyter Notebooks.

In our own everyday data analysis work in a bioinformatics research environment, we encounter a frequent pattern of activity where the report generation has to be embedded into a fairly complicated analysis workflow developed in a general purpose programming language like R or Python. As the analysis proceeds through multiple stages of data loading, data cleaning and the iterative application of the same or different algorithms to the subsets of data, we often need to:

We have developed the R package anrepr that meets all of the above requirements by adopting the "analysis-driven" reporting approach. In this approach, the analysis code is instrumented by including calls to our R package functions wherever the output has to be reported during the analysis execution. The reporting becomes an integral and unobtrusive part of the analysis workflow.

Our package was in part inspired by the existing R packages Nozzle and pander, and it relies heavily on the Markdown generation methods provided by pander.

For example, the Nozzle package has implemented the "analysis-driven" reporting concept by providing an R API that directly generates rich HTML reports. Nozzle has implemented many advanced and specialized features such as dedicated presentation of the results that were marked as "signficant" by the analysis pipeline and on-demand filtering-out of the sections marked as "private data". The polished and thought-out JavaScript user interface (UI) of the Nozzle reports comes at a price of having many decisions hard-wrired at the HTML presentation layer, such as relying on a combination of JQuery and Nozzle's own JavaScript libraries to control the UI elements. The Nozzle also creates a fixed top level HTML DOM structure that might fit perfectly the requirements of the sample processing pipeline at the Broad Institute where this package was created, but not necessarily fit well all external applications.

The Nozzle API provides a method for inserting report elements containing arbitrary HTML chunks for those users which have sufficient skills for manipulating the HTML code.

In our work, we have adopted the approach taken by the pander package that provided a class Pandoc for generating reports in Markdown format. Our anrep class can be viewed as a complete reimplementation of the pander::Pandoc class. The primary difference is that we provide automation for building a hierarchical structure for the reports, generating auto-numbered sections, captions and anchor links for all inserted objects. The consistency of the numbering is maintained even in the presence of error conditions during execution of the analysis code. In contrast, the pander::Pandoc does not provide any facilities for structuring the report beyond a generic ability for inserting header tags.

The pander::Pandoc saves the entire report into a single file, which makes the final HTML reports impossible to load into the Web browser when the number of plots or table rows inserted by the analysis pipeline becomes too large. The same single-file scalability problem will be faced by the Nozzle report, where the documentation advises the users to trim down the size of the tables that they are inserting from their analysis pipeline.

Our class API provides protection against such viewer scalability issues in two ways:

The most fundamental distinction of our work from Nozzle is that our anrep class (as well as the pander::Pandoc class) emits Markdown text instead of HTML. Using Markdown aims to create more separation between content generation and the final presentation format that is typically expected with the HTML output.

As a convenience, we provide infix operators for generating subsection and subreport hierarchy by wrapping blocks of the existing analysis code with curly braces. We found that feature a big help in the rapid development of multiple project-specific research analysis pipelines with actively changing structure.

Package-level functions from the pander package are used to automatically convert a multitude of R data types into Markdown. That includes extraction of vector and raster images from plot objects produced by the major R graphing libraries. The save() method of our report class calls the well-known external Pandoc converter utility to turn the Markdown output into any of multiple final presentation formats, such as HTML pages and slide shows, or Word and PDF documents. Custom style sheet files and JavaScript libraries can be supplied by the users to completely change the appearance and behaviour of the rendered HTML pages.

Our report object detects if it is being ran from a Knitr document, and returns the entire assembled report to properly show up under Knitr, including all inserted dynamic htmlwidget objects. Users can also insert their own arbitrary Markdown or HTML text into the report. Markdown is very easy to write considering the simplicity of its syntax relative to, for example, HTML.

We have implemented dedicated methods in our anrep class for adding to the report the most frequently used R datatypes such as data frames, vectors and plots with a single function call. That call generates Markdown content with a numbered caption and anchor, saves the full size element version on disk and links it from the caption, and seemlessly takes care of various edge cases that have to be taken into an account when directly using the conversion methods from the pander package. One of our class methods inserts R objects from a growing collection of packages which implement the htmlwidgets interface, thus allowing generating a rich set of dynamic JavaScript elements in the final HTML reports.

The minimal Hello World example

This code adds to the report two pairs of tables and plots using a for loop. You can see in the report output rendered by Knitr that tables and plots were decorated with auto-generated captions. Consider, for example, this caption:

"

(1.1) Table 2. Hello Table for rows 29, 30, 20, 19, 2. Full dataset is also saved in a delimited text file (click to download and open e.g. in Excel) data/Table.2-1.1-1527b2eb7a167.csv

"

The caption starts with the current hierarchical numbering of the section where the corresponding object is contained at (1.1). This is followed by the auto-generated object index Table 2 that runs through the entire report separately for each object type such as table or plot types. In the HTML output, the object index is an anchor for sharing links to that place in the report. After that, the user-provided descriptive text is inserted, followed by the link to an auto-generated data or image file that can be used in downstream processing.


Here is the example code. The rendered output follows the code.


For vignette output, we will encode the linked data files and high resultion images into the HTML output file.

self.contained.data = TRUE
library(anrepr)
report = anrep("Hello World",self.contained.data = self.contained.data)
set.seed(1)
for(i in 1:2) {
  rows = sample(nrow(mtcars),5)

  report$add.table(mtcars[rows,1:6],
                   caption=sprintf("Hello Table for rows %s",
                                   paste0(rows,collapse = ", ")),
                   show.row.names = T)

  report$add(with(mtcars[rows,],plot(mpg,hp)),
             caption=sprintf("Hello Plot for rows %s",
                             paste0(rows,collapse = ", ")),
             graph.unify=T,
             hi.res=T)
}

# Printing the return value from the save() method will show
# rendered Markdown if running under Knitr, else write HTML
# and data files in the current directory.
report$save()

End of the rendered output from the example code.


The raw generated Markdown from the Hello World example looks like this:

self.contained.data = FALSE

cat(report$save())
self.contained.data = TRUE

About running under Knitr

The report object detects that it is being executed from Knitr, and returns a string with the final Markdown from its save() method, automatically labeled to be passed for rendering asis (matching the effect of the result='asis' setting in the Knitr chunk options). This is why we had to use cat when we actually wanted to see the raw Markdown text.

The knitr output mode also turns off the generation of subreports (linked multiple documents) and collects everything into a single Markdown string. The links to saved data files would not work if devtools::build_vignettes() call was used to generate a vignette for the R package, due to the devtools expectations that the output must be a single file. Therefore, we pass self.contained.data = TRUE parameters to the report constructor in order to embed the linked files as data URLs into the resulting HTML document.

This vignette document was built with knitr for demonstrating the anrep API. The default intended mode for the anrep, however, is to let the anrep$save method call Pandoc conversion utility directly and generate a static Web site with multiple Web pages and data files, while running outside of the Knitr environment. We call this html mode.

The html mode would have been used automatically if we just called report = anrep("Hello World") outside of Knitr, or if we forced it with report = anrep("Hello World",out.formats="html") even when running under Knitr.

You will be able to view the results of html mode as a full auto-generated Web site following the link provided for a larger example script further in this document. For now, we will use one more Knitr-rendered example for the purposes of introducing anrep API features.

You can use other output formats supported by Pandoc as values for the out.formats argument, with a caveat that not all types of elements inserted into the generated Markdown are supported by every possible Pandoc output format.

Generating subsections and inserting htmlwidgets

The report hierarchy is represented by nested sections as well as by subreports (separate report files) linked from their parent reports. Creating another level of the hierarchy is done by using the appropriate infix operator after the call to a add.header method. In the example below, all reporting calls inside the %anrep>>% { } code block will be added to a new subsection one level deeper than what it was before the add.header call. The original section numbering is restored after the code block is finished. The default add.header call after that simply increments the section index at the current level.

At the end of this example, we will also add to the report a couple of dynamic plot widgets, after testing that the corresponding widget packages are available in this R instance.


Here is the example code. The rendered output follows the code.


report = anrep("Hello World",self.contained.data = TRUE)
report$add.header("First Hello header")
report$add.table(mtcars[1:5,1:6], caption="Hello Table", show.row.names = T)
# The operator %anrep>>% drops into a subsection:
report$add.header("Second header with subsections") %anrep>>% { 
  report$add.header("Subsection under the second header")
  report$add.descr("*Computing things here. Complex computations.*")
  fit = lm(mpg~hp,mtcars)
  report$add.descr("*Having done computing. Took a while.*")
  report$add(fit,caption = "Hello model")
  report$add.header("Another subsection under the second header")
  report$add(with(mtcars,plot(mpg,hp)), caption="Hello Plot",graph.unify=T)
# closing braces restores the previous section level
}
report$add.table(mtcars[5:10,], caption="We are back at the original section level. Hello Table Again")
if(requireNamespace("threejs", quietly = TRUE)) {
  z = seq(-10, 10, 0.01)
  x = cos(z)
  y = sin(z)
  sp3 = threejs::scatterplot3js(x,y,z, color=rainbow(length(z)))
  report$add.widget(sp3,
                    caption = "ThreeJS widget. You can rotate it with your mouse.")
}
report$add.header("Third header, auto-incremented at the same level as the second header")
report$add.printed(lm(mpg~hp,mtcars),caption = "Hello model again")
if(requireNamespace("plotly", quietly = TRUE)) {
  wd = plotly::plot_ly(cbind(Model=rownames(mtcars),mtcars),
                                    x = ~mpg, y = ~qsec, color = ~hp, mode="markers", marker = list(size = ~wt),
                                    text =~paste("Model:", Model, "<br>Weight:", wt))
  report$add.widget(wd,
                    caption = "Dynamic Plotly plot: hover, zoom and brush with your mouse")
}
report

End of the rendered output from the example code.


The larger example with Subreports

A worked example source file is included with the package, and can be located with this command:

library(anrepr)
example_code_file <- system.file("extdata", "example_sections.R", package = "anrepr",mustWork = TRUE)

Below is the code listing of that example. The code uses the anrep class API calls to create a report with multiple nested sections and subreports and to insert a few plots and tables. Look at the inline comments for the annotation of major steps.

If all the optional graphics packages are available, the output report becomes too large to distribute with the package, but you can browse the current copy at the code repository on GitHub.

## Workaround from this: https://github.com/yihui/knitr/issues/1647
rc <- knitr::read_chunk
rc(example_code_file)

After defining the functions above, we run the report generation pipeline expressed by these functions. In the call below, the generated Markdown code will be converted into HTML only when Pandoc executable is available.

make_example_sections_report("example_sections_report")


andreyto/anrepr documentation built on Feb. 24, 2020, 5:31 a.m.