knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(cctu)
The package cctu
has been updated to utilize the DLU file and CLU file to facilitate the analyzing process. The newly added functions will not affect previous process. But here, in this document we are going to learn some new functions provided in the package to facilitate your analysis. Before you start, you should get yourself familiarized with the cctu
package by looking through the Analysis Template
vignette. Remember, the sumby
function will not benefit from any of the data attribute related new functions mentioned in this document. Namely, the variable label and value labels will not have an effect on the output. You don't need to go through everything in this document if you are going to stick to the sumby
function.
If you are familiar with SAS
, Stata
or SPSS
, you should already know there's a variable label and value label (variable format in some). The variable label gives you the description of the variable, and the value label is to explain what the values stand for. The variable will stay as a numeric but has categories attached to it. Which means, you can subset or manipulate it like a numerical variable but report the data as a categorical one. It is important to understand this, let's use mtcars
dataset and demonstrate how it works.
data(mtcars) # Assign variable label var_lab(mtcars$am) <- "Transmission" # Assign value label with named vector val_lab(mtcars$am) <- c("Automatic" = 0, "Manual"=1) str(mtcars$am)
As you can above, label
and labels
attributes were added. But the variable is numeric. You can still summarize it as numeric variable as below:
summary(mtcars$am)
You can convert the value label to factor just before your final analysis. The to_factor
function replace the value with labels and convert the variable to factor.
# Extract variable label var_lab(mtcars$am) # Convert variable to factor with labels attached to it table(to_factor(mtcars$am))
But be cautious, some R process might drop the variable or value label attributes. Normally it won't, but you should check it before report if you are not sure. For example, as.numeric
, as.character
and as.logiac
will drop the variable and value label. There are to_numeric
, to_character
and to_logical
can be used to convert the data type. You can also use the copy_lab
function to copy the variable and value labels from the other variable.
You can use var_lab
to extract the variable label or assign one. And use has.label
to check if the variable has any variable label with it. You may also want to use drop_lab
to drop the variable label.
For value label, you can use val_lab
to extract or assign value label. And use has.labels
to check if the variable has a value label. There are also unval
function to drop the value label and lab2val
to replace the data.frame
value to its corresponding value labels.
There are new functions have been added to utilize the DLU and CLU files for the data analysis. The apply_macro_dict
function uses DLU and CLU file to assign variable and value labels to the dataset. This function will convert the variable name in the data and DLU to lower case by default. This will also convert the dataset based on the variable type as in the DLU file. If you don't want to convert the variable name to lower cases, you should set clean_names = FALSE
.
The extract_form
can extract MACRO data by form and visits. The data will be converted to data.table
class, which is an extension of the data.frame
and works exactly the same. It is a great package with lots of data manipulation capability, you should seek the website for more details. But one thing to remember is that all the names in the variable selection will be considered as a variable of the data.
vars <- c("mpg", "am") # You can do this in the normal data.frame mtcars[,vars] # But you can't do this for the data.table dat <- data.table::data.table(mtcars) mtcars[,vars] # You need to add with=FALSE to do that mtcars[,vars, with=FALSE]
You might already knew how to use the sumby
function, but now a new function called cttab
has been added. The difference between these two functions are the latter can handle variable label and value labels. You can feed the labelled data to this function and it will populate a summary table. Report data by treatment group, stratify tables by visit. Also, you can report variable based on some conditions and group the variable in the report. It generates a missing report internally and you can dump the missing report at the end. No extra step is needed. It will also produce the summary plots of the variables. The produced plots will be arrange to 3 by 3 and new plot will be producd if the variables exceeds 9. The remaining of this document will show you with a working case.
The cctab
function has a good flexibility, which means it has lots of parameters you can use. You should check out the manual of the cctab
function. But, you can use options
to set the default value of the parameters to be used by cctab
. Below is setting some of the options, for more details please check out the cctab
function manuals.
options(cctu_digits = 3) # keep 3 digits for numerical value options(cctu_digits_pct = 0) # keep 0 digits for percentage options(cctu_subjid_string = "subjid") # Set subject ID string options(cctu_print_plot = FALSE) # Don't produce plots
In this section, we will show how to populate tables.
You should read the data as before, but you can and should read the data by setting all the columns to character. This will be handled later. Same process can be found in the Analysis Template
vignette.
# Read example data dt <- read.csv(system.file("extdata", "pilotdata.csv", package="cctu"), colClasses = "character") # Read DLU and CLU dlu <- read.csv(system.file("extdata", "pilotdata_dlu.csv", package="cctu")) clu <- read.csv(system.file("extdata", "pilotdata_clu.csv", package="cctu"))
If you have multiple datasets, you should combine them at this stage. If multiple datasets are given and they have the same variable name, you can use rbind
function to combine data together. If you have different datasets with different variable names except for some key variables, you can use merge_data
function to combine them together.
# Read example data dt_a <- read.csv(system.file("extdata", "test_A.csv", package="cctu"), colClasses = "character") dt_b <- read.csv(system.file("extdata", "test_B.csv", package="cctu"), colClasses = "character") # Read DLU and CLU dlu_a <- read.csv(system.file("extdata", "test_A_DLU.csv", package="cctu")) dlu_b <- read.csv(system.file("extdata", "test_B_DLU.csv", package="cctu")) clu_a <- read.csv(system.file("extdata", "test_A_CLU.csv", package="cctu")) clu_b <- read.csv(system.file("extdata", "test_B_CLU.csv", package="cctu")) # Merge dataset with merge_data function res <- merge_data(datalist = list(dt_a, dt_b), dlulist = list(dlu_a, dlu_b), clulist = list(clu_a, clu_b)) dt <- res$data # Extract combined data dlu <- res$dlu # Extract combined DLU data clu <- res$clu # Extract combined CLU data
Next, we will apply the DLU and CLU files to the dataset. Do not clean variable name in the dataset and DLU/CLU files before applying apply_macro_dict
. The function can handle variable name cleaning by setting clean_names = TRUE
, this is default behaviour. The apply_macro_dict
function will clean the variable name in the DLU and CLU, then apply these macro meta data to the target dataset. The cleaned DLU data will be sotred internally and is further used by cttab
to report missingness. You can use get_dlu
to extract the internal DLU data after applying apply_macro_dict
to the dataset for further useage or as a reference. Although you can use set_dlu
to change the internal DLU data stored by apply_macro_dict
, this is not recommended unless you know what you are doing. This will have an impact on the missing report.
# Create subjid dt$subjid <- substr(dt$USUBJID, 8, 11) # Apply CLU and DLU files dt <- apply_macro_dict(dt, dlu = dlu, clu = clu, clean_names = FALSE) # Give new variable a label var_lab(dt$subjid) <- "Subject ID"
After this, you should follow the Analysis Template
vignette and setup the population etc.
# You should read set_meta_table(cctu::meta_table_example) #Create the population table popn <- dt[,"subjid", drop=FALSE] popn$safety <- TRUE create_popn_envir("dt", popn) .reserved <- ls()
Next we will do some data analysis.
After you have attached the population, next thing you may want to do is extract a particular form from the data. For the table, assume we are reporting age, sex and BMI from the patient registration form by treatment arm. For demonstrating the filtering, we only report non-white patients' BMI.
# Attach population attach_pop("1.1") # Extract patient patient registration form and keep subjid variable df <- extract_form(dt, "PatientReg", vars_keep = c("subjid")) # Now report Age, Sex and BMI. For BMI, report not white only X <- cttab(x = c("AGE", "SEX", "BMIBL"), # Variable to report group = "ARM", # Group variable data = df, # Data select = c("BMIBL" = "RACEN != 1")) # Filter for variable BMI # Write table X
One can use formula in the cctab
just like lm
, but you will not be able to group variables (described in the later section). The left hand side of the formula is the variables to be summarised. Right hand side of the formula is the grouping and/or row splitting variables. The by visit variable should be separated by |
with grouping variable, use 1
if there is no grouping variables. No group
or row_split
parameters to be used in the formula interface. All the other parameters are the same.
# No grouping X <- cttab(AGE + SEX + BMIBL ~ 1, data = df) # Group summarise by ARM X <- cttab(AGE + SEX + BMIBL ~ ARM, data = df) # Group summarise by ARM and split by visit cycle X <- cttab(AST + BILI ~ ARM | AVISIT, data = df) # Split by visit cycle X <- cttab(AST + BILI ~ 1 | AVISIT, data = df)
The cttab
function will report the missing internally. You can use the following to get the missing report.
# This will save the missing report under Output folder # Or you can set the output folder and name dump_missing_report() # Pull out the missing report if you want miss_rep <- get_missing_report() # Reset missing report reset_missing_report()
After this, you can finish the remaining as in the Analysis Template
vignette.
As you have seen previously, the cttab
function can easily populate simple tables.
Table only some variables, no treatment arm or variable selection.
X <- cttab(x = c("AGE", "SEX", "BMIBL"), # Variable to report data = df) # Data X
This is what we have seen before
X <- cttab(x = c("AGE", "SEX", "BMIBL"), # Variable to report group = "ARM", # Group variable data = df, # Data select = c("BMIBL" = "RACEN != 1")) # Filter for variable BMI X
You can define row_split
parameter to the name of visit or repeat variable.
attach_pop("1.1") df <- extract_form(dt, "Lab", vars_keep = c("subjid", "ARM")) X <- cttab(x = c("AST", "BILI", "ALT"), group = "ARM", data = df, row_split = "AVISIT", # Visit variable select = c("ALT" = "PERF == 1")) X
In this example, we will report demographic variable, lab results and lab abnormality. Variables will be grouped, no group name will be given to demographic variables, "Blood" to lab results and "Pts with Abnormal" to lab abnormality. Here, we count the number of patients with abnormal lab results. The cttab
will report the count and percentage of TRUE
. This is useful if you want to report patient numbers for different condition that belong to one category. Below is how to do it:
# Prepare data as before attach_pop("1.1") df <- extract_form(dt, "PatientReg", vars_keep = c("subjid")) base_lab <- extract_form(dt, "Lab", visit = "SCREENING", vars_keep = c("subjid")) # Define abnormal base_lab$ABNORMALT <- base_lab$ALT > 22.5 var_lab(base_lab$ABNORMALT) <- "ALT abnormal" base_lab$ABNORMAST <- base_lab$AST > 25.5 var_lab(base_lab$ABNORMAST) <- "AST abnormal" df <- merge(df, base_lab, by = "subjid") # Table X <- cttab(x = list(c("AGE", "SEX", "BMIBL"), # Group lab variable "Blood" = c("ALT", "AST"), # Group abnormal variable "Pts with Abnormal" = c("ABNORMAST", "ABNORMALT")), group = "ARM", data = df, # Add some filtering select = c("BMIBL" = "RACEN != 1", "ALT" = "PERF == 1")) X
The default behaviour of this function is to keep one digits for the percentage and 3 significant value for the numerical values. The default rounding function is signif_pad
, you can also use round
or round_pad
to keep digits in the summary. The format_percent
function is used to format the percentage values. There is format_pval
might be useful to you.
X <- cttab(x = c("AGE", "SEX", "BMIBL"), # Variable to report group = "ARM", # Group variable data = df, # Data digits = 2, # Keep 2 digits for numerical digits_pct = 1, # Keep 1 digits for percentage rounding_fn = round) # Use function round for rounding X
Tables can be cross-referenced in the word. This is very convenient if you are writing narratives and want to reference tables or figures. You can find how to cross reference tables or figures here. If you can changed the order of the tables by cut and paste or by dragging the headings in the navigation pane of the word the reference will still be there. You can press Ctrl+A
select all, and press F9
. This will change numbering of the table and figure, this change will also be applied to the references you made in the narratives.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.