knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The package cctu
is designed to help run the analysis of clinical trials which typically produce an output of a large word document with one page per table or figure following on from a Statistical Analysis Plan, and requires substantial audit trails. Automating the trivialities of getting tables into word from R, and capturing the meta data and audit trials is what the package achieves.
This document should give you an overview of the tools provided in this package, explain the thought behind them, and show how to use them together. We will start by explaining the rationale, and then finish with a worked example. A complimentary set of R scripts and outputs, that aren't easily directly captured within a vignette are also provided with the /doc directory.
Further functions and tools have been added, for more streamlined importing of MACRO data, give improvements to the basic tables, and to automate the reporting of missing data. These are explained more in the separate using-cttab
vignette.
As part of writing the Statistical Analysis Plan (SAP), a structured meta-table with one row per table or figure needs to be produced. This provides titles, subtitles, numbering, populations, etc. See the help page ?meta_table
for more details. The package is very reliant on this being provided by the user. Whilst producing the tables/figures the package will:
When producing the final output it will use the meta-table to index all the outputs, and pull out the meta-data needed.
Given all the interactions needed, which is somewhat contrary to the concept in R of functions just returning output rather than modifying the user environment, the meta_table is stored within a environment that is private to the package. So it needs to be set up using set_meta_table()
and get_meta_table()
to extract a copy.
It is assumed that there is a data.frame that has one row per subject, and a column for each population defined in the SAP that is a logical indicating if a subject is included in each population. The definition of the populations, their names, and how the inclusion is to determined, are all important steps but outside of this package!
We provide a function create_popn_envir
that takes the data.frame described above, and a list of other data.frames. It then filters each of the listed data.frame, looping over each of the populations, and saves them within a set of environments one per population.
During the analysis code, the function attach_pop()
is repeatedly called with the number of a table/figure. The corresponding row from meta-table is used to identify which population is associated with the table/figure, and the environment is attached. So the list of dataframes that were filtered in create_popn_envir
can now be accessed, safe in the knowledge that the right population is used.
A set of default local files can be set up using cctu_initialise
where the outputs will be stored using the default arguments.
The code itself is assumed to be modularised into a sequence of code that calls other code using source
.
Main.R
at the top with a minimal amount of initialisation: working directory, PATHs, library(cctu)
, run_batch()
, then a sequence of source()
, final calling of create_word_xml()
and maybe further calls to render
for other outputs.create_popn_envir
happens.It is important to call library(cctu)
before the first use of source
. The package modifies the base function base::source
, so as to record the name of the file that called source and the name of the file sourced. This is to provide an audit trail allowing the sequence of code used to create a table/figure to be easily found. Like meta_table
this audit trail lives in a private environment and is not easily directly accessible.
The step of transfering table and figures to word is facilitated by xml. The tables are stored as a copy on a local drive (in /Output/Core by default) in semi-readible format with tags similar to html tables.
During the analysis, a block of code to produce a table or figure is put between an attach_pop()
and a write_table()
or write_gglot()
. An argument to write_table
will be a data frame; write_ggplot
defaults to using the last plot object, or can take a ggplot object as an argument. Both then look up the number of the table called by attach_pop
(temporarily stored in a private environment); the output is stored on the local drive either an XML file, or graphics file named with a suffix "_X.Y.Z" to identfy the output; the name of the code file that created the output is written to meta-table; by default a final call to clean_up
removes all objects in the global environment (apart from those named in the vector .reserved
) and detach the population environment.
After the analysis, the function create_word_xml
glues these files all together along with the meta data into a large xml file. The tables are directly copied, and the figures have links to their file paths. Then xslt technology is used to convert the document into the schema used by MS Word, so it can be opened directly. Note though that the links to the graphics files would need to be turned into hard copies within Word if you want to move the document. If you want to tweak the style of the word document then you would edit the xslt file totally at your own risk!
Other function may be worth looking up in their help files, and hopefully their names give a hint!
read_data
data_table_summary
clean_names
remove_blank_rows_cols
var_lab
apply_macro_dict
extract_form
The set of external data files to be read in, along with the names of the corresponding data.frames created should be recorded in one initial data.frame. This can then be summarised in a table in the report using data_table_summary
and also used to succinctly read in all the data files with read_data
.
sumby()
is a work-horse to produce standard summary tables, with additional figures produced as side-effectscttab
is a recent addition which greatly expands the flexibility and scope from sumby
propercase
rbind_space
: to glue together repeated outputs from sumby
clean_up
is called by default within each call of write_*
. They have arguments to not call clean_up
if the code is such that you need to re-use the same R objects for multiple outputs..reserved
is a hidden variable in the global environment. clean_up
ignores any objects named in .reserved
. So you can edit this as an alternative to using the arguments within write_*
, but ideally it should be defined once at the end of data manipulation.rm_envir
maybe helpful during code development and interactive use of attach_pop
.With the use of cttab
there are created as a side-effect summaries of missing data. See these functions
dump_missing_report
get_missing_report
reset_missing_report
As described already source()
captures a trail of which code sources other code. There is a demonstration below of how to convert this into a graphical tree representation using a combination of Rmarkdown and latex. Or you can just get a copy cctu:::cctu_env$code_tree
and write a local copy to a csv file say.
In a similar fashion we have some examples of Rmarkdown that can be use to read in the outputs from write_*
and produce a Html or pdf, or ... version of the main report, depending on what the readers of the report prefer as their format.
At the end of your code it is good practice to run Sys.info(), date(), sessionInfo()
to document the fine details of your session, package versions, etc. Also it may want to be run for the final definitive version of the report on a different server with a validated instance of R, using a R CMD Batch
which has the nice side-effect of creating a .Rout file with a complete log of all the code and outputs. A wrapper function to avoid having to open a command line window for your operating system is run_batch()
.
This document is used to illustrate how to set up a standard set of analysis using the library cctu. It assumes that you have copied across a template blank folder structure (cctu_initialise
), created a library within the main folder with all the extra packages you may need, and set up a git versioning instance and rstudio project. A future project will be to document this step.
In the the top level this is a file called "main.R"
rm(list=ls()) #set to the library folder .libPaths() library(cctu) options(verbose=TRUE) #run_batch("main.R") DATA <- "PATH_TO_DATA" cctu_initialise() rm_output()
If you run just these initial lines, the last command will evoke R CMD BATCH to run the entire set of code and produce a log file "main.Rout", which should be the final step, using the validated server. The run_batch
line is commented out as the vignette will not work with this though..
This vignette now differs from a standard use, in that the Main.R
file would now be a sequence of source()
calls. Here we do run the source files, and then quote the R code they contain. There is a copy of all files and outputs from a standar use starting from Main.R
It is recommended to set up a config.R file that
sourceKnit <- function(path, label){ cmd <- knitr::knit_expand(file.path("Progs","templ.Rmd")) knitr::knit_child(text=unlist(cmd)) }
r sourceKnit(file.path("Progs","config.R"), "config")
Next step is to import the meta-table, study data and apply manipulations, and finally create the population environments.
Here we grab a ready prepared 'dirty' raw data typical of MACRO DB and apply some of these concepts.
r sourceKnit("Progs/data_import.R", "data_import")
From here one coudl refer to the vignette using-cttab
, and apply further functions: apply_macro_dict
, and/or var_lab
r sourceKnit(file.path("Progs","analysis.R"),"analysis")
Need to create names for the population labels, including the number of subjects. Good to name the report ending with the suffix ".doc".
pop_size <- sapply( popn[,names(popn)!="subjid"], sum) pop_name <- unique(get_meta_table()$population) index <- match(pop_name, names(pop_size)) popn_labels <- paste0(propercase(pop_name), " (n = ", pop_size[index],")") write.csv(get_meta_table(), file=file.path("Output","meta_table.csv"), row.names = FALSE) write.csv(get_code_tree(), file=file.path("Output","codetree.csv"), row.names = FALSE) create_word_xml(report_title="Vignette Report", author="Simon Bond", filename=file.path("Output","Reports","Vignette_Report.doc"), popn_labels=popn_labels ) Sys.info() sessionInfo() date()
if(!dir.exists("../inst/doc")){dir.create("../inst/doc")} unlink("../inst/doc/Output", recursive=TRUE) file.copy("Output", "../inst/doc", recursive=TRUE)
The output is Output/Reports/Vignette_Report.doc. To permanently save, first go to file > Edit Links to Files; highlight all the figures (shift + scroll), and click "break link". Then File> save as, and esnure it is Saved As Type a "Word Document (*.docx)".
We can use rmarkdown to create other versions of the main report or a graphical representation of the code architecture. You may need to use
Sys.setenv(RSTUDIO_PANDOC="C:/Program Files/RStudio/bin/pandoc")
before calling render
, but this is not understood by the author at present as to why.
Numerous other outputs can be obtained from rmarkdown: slide show of the figures, reformatting of output as desired...
See the vignette Code Tree Document
for an example that produces a seperate pdf document.
Your readers may prefer to get the report in a HTML document to view on-screen. This provides easier navigation with a floating toolbar. See the vignette Vignette Report HTML
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.