knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

What this Document Does

The package cctu is designed to help run the analysis of clinical trials which typically produce an output of a large word document with one page per table or figure following on from a Statistical Analysis Plan, and requires substantial audit trails. Automating the trivialities of getting tables into word from R, and capturing the meta data and audit trials is what the package achieves.

This document should give you an overview of the tools provided in this package, explain the thought behind them, and show how to use them together. We will start by explaining the rationale, and then finish with a worked example. A complimentary set of R scripts and outputs, that aren't easily directly captured within a vignette are also provided with the /doc directory.

Further functions and tools have been added, for more streamlined importing of MACRO data, give improvements to the basic tables, and to automate the reporting of missing data. These are explained more in the separate using-cttab vignette.

Key Concepts

Meta Table

As part of writing the Statistical Analysis Plan (SAP), a structured meta-table with one row per table or figure needs to be produced. This provides titles, subtitles, numbering, populations, etc. See the help page ?meta_table for more details. The package is very reliant on this being provided by the user. Whilst producing the tables/figures the package will:

When producing the final output it will use the meta-table to index all the outputs, and pull out the meta-data needed.

Given all the interactions needed, which is somewhat contrary to the concept in R of functions just returning output rather than modifying the user environment, the meta_table is stored within a environment that is private to the package. So it needs to be set up using set_meta_table() and get_meta_table() to extract a copy.

Populations

It is assumed that there is a data.frame that has one row per subject, and a column for each population defined in the SAP that is a logical indicating if a subject is included in each population. The definition of the populations, their names, and how the inclusion is to determined, are all important steps but outside of this package!

We provide a function create_popn_envir that takes the data.frame described above, and a list of other data.frames. It then filters each of the listed data.frame, looping over each of the populations, and saves them within a set of environments one per population.

During the analysis code, the function attach_pop() is repeatedly called with the number of a table/figure. The corresponding row from meta-table is used to identify which population is associated with the table/figure, and the environment is attached. So the list of dataframes that were filtered in create_popn_envir can now be accessed, safe in the knowledge that the right population is used.

Code Architecture

A set of default local files can be set up using cctu_initialise where the outputs will be stored using the default arguments.

The code itself is assumed to be modularised into a sequence of code that calls other code using source.

It is important to call library(cctu) before the first use of source. The package modifies the base function base::source, so as to record the name of the file that called source and the name of the file sourced. This is to provide an audit trail allowing the sequence of code used to create a table/figure to be easily found. Like meta_table this audit trail lives in a private environment and is not easily directly accessible.

XML

The step of transfering table and figures to word is facilitated by xml. The tables are stored as a copy on a local drive (in /Output/Core by default) in semi-readible format with tags similar to html tables.

During the analysis, a block of code to produce a table or figure is put between an attach_pop() and a write_table() or write_gglot(). An argument to write_table will be a data frame; write_ggplot defaults to using the last plot object, or can take a ggplot object as an argument. Both then look up the number of the table called by attach_pop (temporarily stored in a private environment); the output is stored on the local drive either an XML file, or graphics file named with a suffix "_X.Y.Z" to identfy the output; the name of the code file that created the output is written to meta-table; by default a final call to clean_up removes all objects in the global environment (apart from those named in the vector .reserved) and detach the population environment.

After the analysis, the function create_word_xml glues these files all together along with the meta data into a large xml file. The tables are directly copied, and the figures have links to their file paths. Then xslt technology is used to convert the document into the schema used by MS Word, so it can be opened directly. Note though that the links to the graphics files would need to be turned into hard copies within Word if you want to move the document. If you want to tweak the style of the word document then you would edit the xslt file totally at your own risk!

Pointers

Other function may be worth looking up in their help files, and hopefully their names give a hint!

Data import & Manipulation

The set of external data files to be read in, along with the names of the corresponding data.frames created should be recorded in one initial data.frame. This can then be summarised in a table in the report using data_table_summary and also used to succinctly read in all the data files with read_data.

Analysis

Missing Data

With the use of cttab there are created as a side-effect summaries of missing data. See these functions

Audit trails

As described already source() captures a trail of which code sources other code. There is a demonstration below of how to convert this into a graphical tree representation using a combination of Rmarkdown and latex. Or you can just get a copy cctu:::cctu_env$code_tree and write a local copy to a csv file say.

In a similar fashion we have some examples of Rmarkdown that can be use to read in the outputs from write_* and produce a Html or pdf, or ... version of the main report, depending on what the readers of the report prefer as their format.

At the end of your code it is good practice to run Sys.info(), date(), sessionInfo() to document the fine details of your session, package versions, etc. Also it may want to be run for the final definitive version of the report on a different server with a validated instance of R, using a R CMD Batch which has the nice side-effect of creating a .Rout file with a complete log of all the code and outputs. A wrapper function to avoid having to open a command line window for your operating system is run_batch() .

Worked Example

This document is used to illustrate how to set up a standard set of analysis using the library cctu. It assumes that you have copied across a template blank folder structure (cctu_initialise), created a library within the main folder with all the extra packages you may need, and set up a git versioning instance and rstudio project. A future project will be to document this step.

In the the top level this is a file called "main.R"

Initial lines

rm(list=ls())
#set to the library folder
.libPaths()
library(cctu)
options(verbose=TRUE)
#run_batch("main.R")
DATA <- "PATH_TO_DATA"
cctu_initialise()
rm_output()

If you run just these initial lines, the last command will evoke R CMD BATCH to run the entire set of code and produce a log file "main.Rout", which should be the final step, using the validated server. The run_batch line is commented out as the vignette will not work with this though..

This vignette now differs from a standard use, in that the Main.R file would now be a sequence of source() calls. Here we do run the source files, and then quote the R code they contain. There is a copy of all files and outputs from a standar use starting from Main.R

Configuration

It is recommended to set up a config.R file that

sourceKnit <- function(path, label){
  cmd <- knitr::knit_expand(file.path("Progs","templ.Rmd"))
  knitr::knit_child(text=unlist(cmd))
}

r sourceKnit(file.path("Progs","config.R"), "config")

Data Import

Next step is to import the meta-table, study data and apply manipulations, and finally create the population environments.

Here we grab a ready prepared 'dirty' raw data typical of MACRO DB and apply some of these concepts.

r sourceKnit("Progs/data_import.R", "data_import")

From here one coudl refer to the vignette using-cttab, and apply further functions: apply_macro_dict, and/or var_lab

Analysis

r sourceKnit(file.path("Progs","analysis.R"),"analysis")

Creating the Report

Need to create names for the population labels, including the number of subjects. Good to name the report ending with the suffix ".doc".

pop_size <- sapply( popn[,names(popn)!="subjid"], sum)
pop_name <- unique(get_meta_table()$population)
index <- match(pop_name, names(pop_size))
popn_labels <- paste0(propercase(pop_name), " (n = ", pop_size[index],")")

write.csv(get_meta_table(), file=file.path("Output","meta_table.csv"), row.names = FALSE)
write.csv(get_code_tree(), file=file.path("Output","codetree.csv"), row.names = FALSE)
create_word_xml(report_title="Vignette Report",
                author="Simon Bond",
                filename=file.path("Output","Reports","Vignette_Report.doc"),
                popn_labels=popn_labels
            )
Sys.info()
sessionInfo()
date()
if(!dir.exists("../inst/doc")){dir.create("../inst/doc")}
unlink("../inst/doc/Output", recursive=TRUE)
file.copy("Output", "../inst/doc", recursive=TRUE)

The output is Output/Reports/Vignette_Report.doc. To permanently save, first go to file > Edit Links to Files; highlight all the figures (shift + scroll), and click "break link". Then File> save as, and esnure it is Saved As Type a "Word Document (*.docx)".

Other Outputs

We can use rmarkdown to create other versions of the main report or a graphical representation of the code architecture. You may need to use Sys.setenv(RSTUDIO_PANDOC="C:/Program Files/RStudio/bin/pandoc") before calling render, but this is not understood by the author at present as to why.

Numerous other outputs can be obtained from rmarkdown: slide show of the figures, reformatting of output as desired...

Code Tree

See the vignette Code Tree Document for an example that produces a seperate pdf document.

HTML version of the report

Your readers may prefer to get the report in a HTML document to view on-screen. This provides easier navigation with a floating toolbar. See the vignette Vignette Report HTML



shug0131/cctu documentation built on Nov. 10, 2023, 12:03 p.m.