This vignette demonstrates how you can start a report generation in your R code
by creating the anrep
class, and then use the methods of this class to
add both structural and data elements to your report.
Using the API is very similar to using a typical logging library to emit some information from different places in your code. Only in this case, you are emitting rich content such as plots and tables, as well as organizing output into a hierarchical structure.
The report hierarchy is expressed through:
anrep
documentation).Overall, the feature set in the anrepr
package allows easily equipping any arbitrarily
complex analysis pipeline with a graphical reporting facility that emits report
elements within functions, loops or conditional branches.
The typical final output is an auto-generated static Web site that contains linked HTML pages, embedded and stand-alone versions of the plots as well as saved data files. The Web pages will contain dynamic Java Script elements if your code decides to report the types of R objects which implement the htmlwidgets interface. The generated Web site directory is portable - it can be copied to other locations or computers and viewed directly from disk, or it can be deployed to a Web server.
Popular R frameworks like Sweave and RMarkdown / Knitr packages greatly facilitate reproducible research and graphical report generation by embedding analysis code into the documents written in a structured markup languages such as LaTex, Markdown or HTML. When the document is "rendered" by Knitr, the code chunks written in R are replaced by the results of their evaluation. A large ecosystem of tools have been developed in the R community for automatically formatting various analysis objects into the markup representation.
One constraint imposed by such "document-driven" reporting approach is that the flow of the analysis is controlled by the linear structure of the template document. Generation of the reporting output from the embedded code will normally proceed from the top to the bottom of the document without branching or looping.
Recognizing a frequent need for a more flexible control over the flow of the report generation, knitr introduces both conditional evaluation tags and references between the chunks of code in the document. However, applying this advanced functionality imposes on the users an additional layer of programming that has to be done in a limited domain specific language (DSL) implicitly defined by the available knitr tags. The users have to learn this DSL and manage to express with it their reporting workflows. This has to happen despite the fact that they are already writing their code chunks in the powerful general purpose programming language such as R.
The situation is similar with the very popular Python-centric literate programming framework Jupyter Notebooks.
In our own everyday data analysis work in a bioinformatics research environment, we encounter a frequent pattern of activity where the report generation has to be embedded into a fairly complicated analysis workflow developed in a general purpose programming language like R or Python. As the analysis proceeds through multiple stages of data loading, data cleaning and the iterative application of the same or different algorithms to the subsets of data, we often need to:
We have developed the R package anrepr
that meets all of the above requirements by adopting the "analysis-driven" reporting approach.
In this approach, the analysis code is instrumented by including calls to our R package functions wherever the output has to be
reported during the analysis execution. The reporting becomes an integral and unobtrusive part of the analysis workflow.
Our package was in part inspired by the existing R packages Nozzle and
pander, and it relies heavily on the Markdown generation methods provided by pander
.
For example, the Nozzle
package has implemented the "analysis-driven" reporting concept by providing an R API that directly generates
rich HTML reports. Nozzle
has implemented many advanced and specialized features such as dedicated presentation of the results
that were marked as "signficant" by the analysis pipeline and on-demand filtering-out of the sections marked as "private data".
The polished and thought-out JavaScript user interface (UI) of the Nozzle
reports comes at a price of having many
decisions hard-wrired at the HTML presentation layer, such as relying on a combination of JQuery and Nozzle's own JavaScript
libraries to control the UI elements. The Nozzle
also creates a fixed top level HTML DOM structure that might fit perfectly
the requirements of the sample processing pipeline at the Broad Institute where this package was created, but not necessarily
fit well all external applications.
The Nozzle
API provides a method for inserting report elements containing arbitrary HTML chunks for those users which
have sufficient skills for manipulating the HTML code.
In our work, we have adopted the approach taken by the pander
package that provided a class Pandoc
for generating reports in Markdown format. Our anrep
class
can be viewed as a complete reimplementation of the pander::Pandoc
class. The primary difference is that we provide automation
for building a hierarchical structure for the reports, generating auto-numbered sections, captions and anchor links for all inserted
objects. The consistency of the numbering is maintained even in the presence of error conditions during execution of the analysis code.
In contrast, the pander::Pandoc
does not provide any facilities for structuring the report beyond a generic ability for inserting header tags.
The pander::Pandoc
saves the entire report into a single file, which makes the final HTML reports impossible to
load into the Web browser when the number of plots or table rows inserted by the analysis pipeline becomes too large. The same
single-file scalability problem will be faced by the Nozzle
report, where the documentation advises the users to trim down
the size of the tables that they are inserting from their analysis pipeline.
Our class API provides protection against such viewer scalability issues in two ways:
The most fundamental distinction of our work from Nozzle
is that our anrep
class (as well as the pander::Pandoc
class) emits Markdown text
instead of HTML. Using Markdown aims to create more separation between content generation and the final presentation format that is typically
expected with the HTML output.
As a convenience, we provide infix operators for generating subsection and subreport hierarchy by wrapping blocks of the existing analysis code with curly braces. We found that feature a big help in the rapid development of multiple project-specific research analysis pipelines with actively changing structure.
Package-level functions from the pander
package are used to automatically convert a multitude of R data types
into Markdown. That includes extraction of vector and raster images from plot objects produced by the major R graphing libraries.
The save()
method of our report class calls the well-known external Pandoc
converter utility to turn the Markdown output into any of multiple final presentation formats, such as HTML pages and slide shows, or Word and PDF documents. Custom style sheet files and JavaScript libraries can be supplied by the users to completely change the appearance and behaviour of the rendered HTML pages.
Our report object detects if it is being ran from a Knitr
document, and returns the entire assembled report to properly show up under Knitr
,
including all inserted dynamic htmlwidget
objects. Users can also insert their own arbitrary Markdown or HTML text into the report.
Markdown is very easy to write considering the simplicity of its syntax relative to, for example, HTML.
We have implemented dedicated methods in our anrep
class for adding to the report the most frequently used R datatypes such as data frames, vectors and
plots with a single function call. That call generates Markdown content with a numbered caption and anchor, saves the full size element version on disk and
links it from the caption, and
seemlessly takes care of various edge cases that have to be taken into an account when directly using the conversion methods from the pander
package.
One of our class methods inserts R objects from a growing collection of packages which implement the htmlwidgets
interface, thus allowing generating
a rich set of dynamic JavaScript elements in the final HTML reports.
Hello World
exampleThis code adds to the report two pairs of tables and plots using a for
loop. You can see in the report output rendered by Knitr
that
tables and plots were decorated with auto-generated captions. Consider, for example, this caption:
"
(1.1)
Table 2.
Hello Table for rows 29, 30, 20, 19, 2. Full dataset
is also saved in a delimited text file (click to download and open e.g. in Excel)
data/Table.2-1.1-1527b2eb7a167.csv
"
The caption starts with the current hierarchical numbering of the section where the corresponding object is contained at (1.1)
. This is followed
by the auto-generated object index Table 2
that runs through the entire report separately for each object type such as table or plot types.
In the HTML output, the object index is an anchor for sharing links to that place in the report.
After that, the user-provided descriptive text is inserted, followed by the link to an auto-generated data or image file that can be used in
downstream processing.
Here is the example code. The rendered output follows the code.
For vignette output, we will encode the linked data files and high resultion images into the HTML output file.
self.contained.data = TRUE
library(anrepr) report = anrep("Hello World",self.contained.data = self.contained.data) set.seed(1) for(i in 1:2) { rows = sample(nrow(mtcars),5) report$add.table(mtcars[rows,1:6], caption=sprintf("Hello Table for rows %s", paste0(rows,collapse = ", ")), show.row.names = T) report$add(with(mtcars[rows,],plot(mpg,hp)), caption=sprintf("Hello Plot for rows %s", paste0(rows,collapse = ", ")), graph.unify=T, hi.res=T) } # Printing the return value from the save() method will show # rendered Markdown if running under Knitr, else write HTML # and data files in the current directory. report$save()
End of the rendered output from the example code.
The raw generated Markdown from the Hello World example looks like this:
self.contained.data = FALSE
cat(report$save())
self.contained.data = TRUE
Knitr
The report object detects that it is being executed from Knitr, and returns
a string with the final Markdown from its save()
method, automatically labeled
to be passed for rendering asis
(matching the effect of the result='asis'
setting in the Knitr chunk options). This is why we had to use cat
when we actually wanted to see
the raw Markdown text.
The knitr
output mode also turns off the generation of subreports (linked multiple documents)
and collects everything into a single Markdown string. The links to
saved data files would not work if devtools::build_vignettes()
call was used
to generate a vignette for the R package, due to the devtools
expectations that the output
must be a single file. Therefore, we pass self.contained.data = TRUE
parameters to
the report constructor in order to embed the linked files as data URLs into the resulting
HTML document.
This vignette document was built with knitr
for demonstrating the anrep
API. The default intended mode
for the anrep
, however, is to let the anrep$save
method call Pandoc conversion
utility directly and generate a static Web site with multiple Web pages and data files, while running
outside of the Knitr
environment. We call this html
mode.
The html
mode would have been used automatically if we just called report = anrep("Hello World")
outside of Knitr, or if we forced it with report = anrep("Hello World",out.formats="html")
even
when running under Knitr.
You will be able to view the results of html
mode as a full auto-generated Web site following
the link provided for a larger example script further
in this document. For now, we will use one more Knitr-rendered example for the purposes of introducing
anrep
API features.
You can use other output formats supported by Pandoc
as values for the out.formats
argument,
with a caveat that not all types of elements inserted into the generated Markdown are supported by
every possible Pandoc output format.
htmlwidgets
The report hierarchy is represented by nested sections as well as by subreports
(separate report files) linked from their parent reports. Creating another level
of the hierarchy is done by using the appropriate infix operator after the call
to a add.header
method. In the example below, all reporting
calls inside the %anrep>>% { }
code
block will be added to a new subsection one level deeper than what it was before the
add.header
call. The original section numbering is restored after the code
block is finished. The default add.header
call after that simply increments
the section index at the current level.
At the end of this example, we will also add to the report a couple of dynamic plot widgets, after testing that the corresponding widget packages are available in this R instance.
Here is the example code. The rendered output follows the code.
report = anrep("Hello World",self.contained.data = TRUE) report$add.header("First Hello header") report$add.table(mtcars[1:5,1:6], caption="Hello Table", show.row.names = T) # The operator %anrep>>% drops into a subsection: report$add.header("Second header with subsections") %anrep>>% { report$add.header("Subsection under the second header") report$add.descr("*Computing things here. Complex computations.*") fit = lm(mpg~hp,mtcars) report$add.descr("*Having done computing. Took a while.*") report$add(fit,caption = "Hello model") report$add.header("Another subsection under the second header") report$add(with(mtcars,plot(mpg,hp)), caption="Hello Plot",graph.unify=T) # closing braces restores the previous section level } report$add.table(mtcars[5:10,], caption="We are back at the original section level. Hello Table Again") if(requireNamespace("threejs", quietly = TRUE)) { z = seq(-10, 10, 0.01) x = cos(z) y = sin(z) sp3 = threejs::scatterplot3js(x,y,z, color=rainbow(length(z))) report$add.widget(sp3, caption = "ThreeJS widget. You can rotate it with your mouse.") } report$add.header("Third header, auto-incremented at the same level as the second header") report$add.printed(lm(mpg~hp,mtcars),caption = "Hello model again") if(requireNamespace("plotly", quietly = TRUE)) { wd = plotly::plot_ly(cbind(Model=rownames(mtcars),mtcars), x = ~mpg, y = ~qsec, color = ~hp, mode="markers", marker = list(size = ~wt), text =~paste("Model:", Model, "<br>Weight:", wt)) report$add.widget(wd, caption = "Dynamic Plotly plot: hover, zoom and brush with your mouse") } report
End of the rendered output from the example code.
A worked example source file is included with the package, and can be located with this command:
library(anrepr) example_code_file <- system.file("extdata", "example_sections.R", package = "anrepr",mustWork = TRUE)
Below is the code listing of that example. The code uses the anrep
class API calls to create a report
with multiple nested sections and subreports and to insert a few plots and tables.
Look at the inline comments for the annotation of major steps.
If all the optional graphics packages are available, the output report becomes too large to distribute with the package, but you can browse the current copy at the code repository on GitHub.
## Workaround from this: https://github.com/yihui/knitr/issues/1647 rc <- knitr::read_chunk rc(example_code_file)
After defining the functions above, we run the report generation pipeline expressed by these functions. In the call below, the generated Markdown code will be converted into HTML only when Pandoc executable is available.
make_example_sections_report("example_sections_report")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.