oFormat="html" knitr::opts_chunk$set(echo = FALSE,message = FALSE,warning = FALSE,error=TRUE,include = TRUE,cache=FALSE,fig.align="center", fig.width = 10, fig.height=8, dpi = 300, out.width = ifelse(oFormat=="pdf","480pt","800px"),tidy = TRUE) isDocx=TRUE library(wfr) sharedLibs <-c("knitr","pander","xtable","magrittr","flextable","officer","openxlsx") extraLibs = `if`(isDocx,c("captioner","stringr"),c("kableExtra")) loadLibs(c(sharedLibs,extraLibs))
r if(oFormat=="html") setHtmlHeaderProperty()
If you are reading this document in HTML format, please read the PDF document because some of the demonstrated formatting features are not displayed correctly in HTML format.
The development of this workflow is inspired by Dr. Wenfei Zhang's previous work in separating data analysis from document creation - saving analysis results in R script and including those results in a R markdown file for document generation.
This work was supervised by Dr. Donghui Zhang at Sanofi.
Thanks to the authors of those excellent R package flextable, kableExtra, knitr, and bookdown.
The primary purpose of this package is to help saving time for users of the following tasks:
In addition, it ensures consistent formatting of the tables and numbers in all such documents created by different people in an organization.
Significant amount of effort spent in a data analysis task is non-statistical.
A large data analysis project can produce many outputs, e.g. tables and figures, which are reviewed later. Annotating and managing all outputs are time-consuming but very important for future query, review, and understanding.
More importantly, in compiling reports for such projects, many of those output tables and figures are included. r tRef("tab1",isDocx)
is one such example. If a report contains two dozen such tables, compiling, formatting, and cross-referencing those tables take lots of time.
fileName=system.file("extdata", "example.table.1.csv", package = "wfr") df1 = read.csv(fileName) header = list(c('variable','stats','subject','subject','subject'),colnames(df1)) footer = list(c("Note:"),c("subject","Caucasian$^ref_1$ only","a","header"), c("Age (years)","One control has no record","b","body")) rmdTable(df1,header = header, footer=footer,rowHeaderInd = 2,isDocx = isDocx,caption = tCap("Age and Sex statistics","tab1",isDocx))
So the objective of this package is to help to reduce the portion of non-statistical work in this process; specifically:
Figure \@ref(fig:figWF) includes four pictures depicting the proposed workflow in conducting data analysis project:
knitr::include_graphics(system.file("extdata", "workflow.png", package = "wfr")) #knitr::include_graphics("../inst/extdata/workflow.png")
saveOutput()
to save each object you want to keep with a caption to explain what the object is. saveOutput
saves either the current R workspace or the rds file of the object, depending on the flag saveWorkspace
. A text or picture file, e.g. csv
or png
, of the object can also be saved via oFileName
. A number of formating options, if the object is a table, can also be saved through corresponding parameters of saveOutput
. saveOutput
also enters all those information into a global data.frame, OUTPUTS
, which is saved as an excel file at the end of R script.
The file explorer shows all the saved files and the excel file.
The content of the excel file can be manually edited. The oFileName
cells are imbedded with links to the file, and clicking opens the files directly.
Run function createRmd()
to produce the Rmd file for the report in either html or docx format. wfr
package provides a R markdown template file that can output docx, html, or pdf format depending by setting oFormat
in the file. Appending to this Rmd template file, createRmd
firstly adds a code chunk to read in the excel file, then one code chunk per row of the excel file for cross reference and display of the object in the row. If the object is a table, its numerical columns are automatically formatted based on the min, max, and median values of the column by [num2formattedStr]. Also see [showObj] for details on how it displays an object.
Based on this auto-generated Rmd file, users can focus on the text part of the report, saving significantly amount of time. A few lines of the Rmd file needs to be customized before knitting, i.e. oFormat, isDocx, outputFileName, title, author
.
It displays an object in knitr code chunk and set the caption with cross reference. Figure \@ref(fig:figSO) depicts how it works:
knitr::include_graphics(system.file("extdata", "showObj.PNG", package = "wfr")) #knitr::include_graphics("../inst/extdata/showObj.PNG")
objID
in the data.frame read in from the excel file.rImageFile
in that row.rmdTable
if a table, along with formatting info either passed in via showObj
or read from other columns in that row of the data.frame.Also note that the chunk labels are passed to showObj
as objID
whose default value is actually NULL
, meaning it is not a required parameter. In knitting the Rmd file, showObj
can grab the chunk label automatically if objID
is not supplied. But by supplying objID
, a chunk can be run alone, e.g. for debugging purpose if the display from showObj
is wrong. To debug, just type debugonce(showObj)
in R console, and read in necessary variables, e.g. df1
and isDocx
, and run the chunk.
This is a wrapper function to plot tables using either flextable
or kable
depending on a flag isDocx
. If isDocx
is TRUE
, the default, flextable
is used because kable
cannot produce complex tables for Word document. Below are some examples on how to use this function.
Figure \@ref(fig:figRmdT1) shows the original table, the final table we want to create, and the intermediate tables; note the name of different sections of a table, esp. row header, the columns in the table body used as the header of rows.
knitr::include_graphics(system.file("extdata", "rmdTable.1.PNG", package = "wfr")) #knitr::include_graphics("../inst/extdata/rmdTable.1.PNG")
The below code chunk shows how each of the table in Figure \@ref(fig:figRmdT1) are created, df1
contains the content forming the original table. The header of df1
has only one row, whereas the header of the final table have two rows. Therefore a new header needs to be created with two vectors, corresponding to the two rows in the final table. rmdTable
automatically merges neighboring cells in the header if their contents are identical, e.g. the four A
cells, and the two C
cells as illustrated in intermediate table 2.
```{#numCode .R .numberLines} df1=data.frame(A=c("a","a","b3"), B=c("b1","b2","b3"), C1=1:3, C2=2:4) flextable(df1) #table 1
columnHeader=list(c('A','A','C','C'), c('A','A','C1','C2')) rmdTable(df1,header = columnHeader) #2
rmdTable(df1,header = header, rowHeaderInd = 1) #3
rmdTable(df1,header = header, rowHeaderInd = 2) #4
footer=list(c("A","Arkansas$~ref$",'1','header'), c("C1","Kansas$^ref$",'x','header'), c('a',"Arizona",'2','body'))
rmdTable(df1,header = header, rowHeaderInd = 2, footer = footer, caption = "table 5: final table", colWidth = c(2,1,1,1), fontSize = 12) #5
colWidth = "2,1,1,1" header = "A | A | C | C || A | A | C1 | C2" footer = "A|Arkansas$~ref$|1|header || C1|Kansas$^ref$|x|header || a|Arizona|2|body"
`rowHeaderInd` is an integer and means column `1:rowHeaderInd` in the table body are going to be the row header. Setting `rowHeaderInd` has two effects on the row headers, as seen in intermediate table 2 and 4: * Just like in the column header, __neighboring cells of identical content are collapsed into one multi-row/column cell__. * Font becomes bold. There are a number of additions in the final table. Probably the most important one is footer; it's a list of character vector which contains _the content of cell to tag_, _the content of footer_, _the tag symbol_, _the section of the table to identify the cell_. Some parameters can also be set as string so that they can be recorded into the excel file in Figure \@ref(fig:figWF), either manually or via `saveOutput`. See the documentation of `rmdTable` for a detailed explanation on `footer` and how to construct the strings. ### Table theme Figure \@ref(fig:figTableTheme) lists all the themes available for docx output, which is based on `flextable` but with the addition of the light grey line inside the table. ```r knitr::include_graphics(system.file("extdata", "table.themes.png", package = "wfr")) #knitr::include_graphics("../inst/extdata/table.themes.png")
For html output, function kable
is used, and only zebra, box, and vanilla are available. In addition, I have not found a way, using kable
, to merge horizontal neighboring cells in the first row of header (the two A
) and in the table body (the two b3
), as figure \@ref(fig:figTableKable) shows. Although such needs for more table themes and merging those cells are rare, if really needed, flextable
can be used for html output by just setting isDocx = TRUE
instead of the isDocx = (oFormat != "html")
in the Rmd file produced by createRmd
. The only drawback is that the cross-reference of flextable-produced tables is not URL-linked, i.e. not clickable.
knitr::include_graphics(system.file("extdata", "table.kable.PNG", package = "wfr")) #knitr::include_graphics("../inst/extdata/table.kable.PNG")
When flextable
is used, that is, when the output format of the Rmd file is Word, the general way to do cross reference for tables, i.e. Table
\@ref(tab:label)
, does not work. Therefore, tRef
and tCap
are created to handle table cross reference in Rmd text and table caption, regardless of the output format.
`r '\x60r tRef("label")\x60'`
replaces Table
\@ref(tab:label)
in Rmd textrmdTable
, parameter caption
should be set as tCap(original.caption,"label",isDocx)
to enable cross reference in caption, e.g. caption = tCap("Age and Sex statistics","tab1",isDocx)
for r tRef("tab1", isDocx)
.Each numerical column of the dataDf
in rmdTable
is formatted automatically by function num2formattedSt
. This function makes decision on the following four aspects of formatting based on the min, median, max values of a numeric vector, so that the formatted numbers carry enough information and are in a length less than 10 characters. For example, for a numeric vector v
, if median(abs(v)) > 100
, it makes no sense to keep decimal digits.
To illustrate, r tRef("tabNoF",isDocx)
is before formatting, and r tRef("tabF",isDocx)
is after formatting.
fileName=system.file("extdata", "example.format.csv", package = "wfr") df1 = read.csv(fileName,check.names = FALSE) flextable(df1) %>% set_caption(caption=tCap("Auto-formatting on numbers using flextable","tabNoF",isDocx)) %>% autofit()
rmdTable(df1, caption = tCap("Auto-formatting on numbers using num2formattedStr","tabF",isDocx), isDocx = isDocx)
For non-html output, this function properly and automatically sets the width of each column so that the width of the whole table does not exceed the maximum width set by maxTableWidth
. The default value of maxTableWidth
is 7.0 inches, which corresponds to "PAGE LAYOUT" > "size" > "letter" and "Margins" > "Moderate" in MS Word. 6.5 inches corresponds to "Margins" > "Normal".
The width of the whole table is the sum of its column. However, computing such width can be very tricky:
The width of a table cell depends on the font family, font size, font type (bold, italic, etc), and table theme. For large and complex tables which barely fits the maxTableWidth
, change of any those factors may mess up some columns. function setWidths
is only tested for font Arial, size 12 - 7, normal and bold type, and theme zebra.
The width of a column is the maximum width among all cells in that column, including header and table body. If the sum of column widths does not fit the maxTableWidth
, there are a few options to reduce the column widths:
minFontSize
In majority cases, the column widths set by setWidths
does not need any manual adjustment. This function is called by [rmdTable] internally. See manual for its algorithm to determine the width of each column. Following are a few illustrations.
r tRef("tabSW1",isDocx)
sets maxTableWith = 7
. The default font size is 11. The algorithm chooses wrapping headers over reducing font size. Headers are wrapped in a way so that
fileName=system.file("extdata", "example.table.2.csv", package = "wfr") df1 = read.csv(fileName,check.names = FALSE) colnames(df1)=gsub('?','Δ',colnames(df1),fixed = TRUE) header=list(c("pathway","gene",rep("batch A",3),rep("batch B",3),rep("A vs B",3)),colnames(df1)) footer=list(c("ΔCt Mean","ΔCt is calulated using median value as reference gene for each visit and patient,i.e., ΔCt = Ct - Ct$~median$","a","header"), c("Gene 1", "Our target gene$^ref$","b","body")) rmdTable(df1, caption = tCap("maxTableWidth is set as 7.0 inches (default)","tabSW1",isDocx), footer = footer,header = header,rowHeaderInd = 1, isDocx = isDocx)
r tRef("tabSW2",isDocx)
sets maxTableWith = 6.5
. The algorithm reduces font size by 1 to fit the table. Note that the font size of the table title and footer is not affected. The former has a minimum size 10, and the latter is set by footerFontSize
in [rmdTable], which defaults at 9.
rmdTable(df1, caption = tCap("maxTableWidth is set as 6.5 inches","tabSW2",isDocx), maxTableWidth = 6.5,footer = footer,header = header,rowHeaderInd = 1, isDocx = isDocx)
r tRef("tabSW4",isDocx)
sets maxTableWith = 6.5
and minFontSize = 9
. Due to some long cells in table body, reducing the font size to 9 still does not fit 6.5 inches. The long columns of table body, i.e. pathway and gene, are wrapped in a way that
rowHeaderInds
, e.g. pathway, may be wrapped more than once if each of its cells spans multiple cells.fileName=system.file("extdata", "example.table.3.csv", package = "wfr") df1 = read.csv(fileName,check.names = FALSE) colnames(df1)=gsub('?','Δ',colnames(df1),fixed = TRUE) rmdTable(df1, caption = tCap("maxTableWidth is set as 6.5 inches ","tabSW4",isDocx), maxTableWidth = 6.5,footer = footer,header = header,rowHeaderInd = 1, minFontSize = 9, isDocx = isDocx)
r tRef("tabSW6",isDocx)
does not reduce font size since the algorithm sees column pathway can be wrapped multiple times.
df2=df1 df2[,2]=NULL footer2=footer[1] header2=list(c("pathway",rep("batch A",3),rep("batch B",3),rep("A vs B",3)),colnames(df2)) rmdTable(df2, caption = tCap("maxTableWidth is set as 6.5 inches ","tabSW6",isDocx), maxTableWidth = 6.5,footer = footer2,header = header2,rowHeaderInd = 1, minFontSize = 9, isDocx = isDocx)
In contrary, in r tRef("tabSW7",isDocx)
the algorithm prefers reducing font size over wrapping column gene since its table body cells are all single row.
df2=df1 df2[,1]=NULL footer2=footer header2=list(c("gene",rep("batch A",3),rep("batch B",3),rep("A vs B",3)),colnames(df2)) rmdTable(df2, caption = tCap("maxTableWidth is set as 6.5 inches and minFontSize as 9","tabSW7",isDocx), maxTableWidth = 6.5,footer = footer2,header = header2,minFontSize = 9, isDocx = isDocx)
For Word output, the formatting style is based on a template file system.file("extdata", "word.template.for.Rmd.dotx", package = "wfr")
. Reads its content to see how to know how its style corresponds to R markdown syntax and how to edit its style. See this article for more details.
flextable_0.5.5 is required.
For docx output from Rmd file, pandoc version >= 2.0 is needed. Rstudio is bundled with pandoc, but only version >= 1.2 has the needed pandoc version. So update your Rstudio if necessary.
For pdf output of Rmd file, a LaTeX engine is needed. If not available, run the following code and restart Rstudio after it finishes. However, this and most LaTeX engine do not accept unicode characters; it might be easier to produce MS Word document and then export it as pdf file.
```{#numCode .R .numberLines} install.packages(c('tinytex', 'webshot')) tinytex::install_tinytex() webshot::install_phantomjs()
## Testing Run the following code for a simple testing. ```{#numCode .R .numberLines} install.packages(c("flextable","knitr","kableExtra","captioner","officer", "openxlsx","bookdown", "pander","devtools"), dependencies=TRUE) # installation from source; add path to the file devtools::install_github("blueskypie/wfr",dependencies=TRUE) # testing library(wfr) library(ggplot2) OFCOUNTER=1 df1=data.frame(A=c("a","a","b3"), B=c("b1","b2","b3"), C1=c(1001.123,58.04,32.01), C2=c(-0.00321, 0.0121, 0.325)) # output directory; set your own directory if preferred. outDir = tempdir() ofn = file.path(outDir, "tab.1.csv") saveOutput(df1, oFileName=ofn, caption="testing table w/o formatting info") ofn = file.path(outDir, "fig.1.png") saveOutput(qplot(1:10,1:10), oFileName=ofn, caption="this is a testing plot") colWidth = "2,1,1,1" header = "A | A | C | C || A | A | C1 | C2" footer = "A|Arkansas$~ref$|1|header || C1|Kansas$^ref$|x|header || a|Arizona|2|body" rowHeaderInd = 2 ofn = file.path(outDir, "tab.2.csv") saveOutput(df1, oFileName = ofn, caption = "testing table w/ formatting info", header = header, footer = footer, colWidth = colWidth, rowHeaderInd = rowHeaderInd) wfrFile=system.file("extdata", "example.table.2.csv", package = "wfr") df1 = read.csv(wfrFile,check.names = FALSE) header="pathway | gene | batch A | batch A | batch A | batch B | batch B | batch B | A vs B | A vs B | A vs B || pathway | gene | N | ΔCt Mean | ΔCt SEM | N | ΔCt Mean | ΔCt SEM | LS mean | Fold change | p-value" footer="ΔCt Mean | ΔCt is calulated using median value as reference gene for each visit and patient,i.e., ΔCt = Ct - Ct$~median$ | a | header" ofn = file.path(outDir, "tab.3.csv") saveOutput(df1, oFileName="tab.3.csv", caption="testing a complex table", header=header, footer = footer, rowHeaderInd = 1, maxTableWidth = 6.5) exFn = file.path(outDir, "all.outputs.xlsx") # open the file to see its content. writeExcel(exFn) # Open the created Rmd files, change the output format "oFormat" to see if # all output formats work. Add a few headings so Table of Content can be # created and also see the effect of the docx reference template for docx # output. rmdFn1 = file.path(outDir, "all.outputs1.Rmd") createRmd(rmdFn1,exFn) rmdFn2 = file.path(outDir, "all.outputs2.Rmd") # add additional formatting to all tables createRmd(rmdFn2,exFn, tabPars="fontSize=12,theme='booktabs'")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.