In ddsjoberg/gtsummary-v0.1: Presentation-Ready Data Summary Tables

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette will walk a reader through the fmt_table1() function, and the various functions available to modify and make additions to an existing Table 1.

To start, a quick note on the magrittr package's pipe function, %>%. By default the pipe operator puts whatever is on the left hand side of %>% into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to fmt_table1() easier to use, but it is not required. Here are a few examples of how %>% translates into typical R notation.

x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)

Here's how this translates into the use of fmt_table1().

mtcars %>% fmt_table1() is equivalent to fmt_table1(mtcars)
mtcars %>% fmt_table1(by = "am") is equivalent to fmt_table1(mtcars, by = "am")
fmt_table1(mtcars, by = "am") %>% add_comparison() is equivalent to
    t = fmt_table1(mtcars, by = "am")
    add_comparison(t)

Basic Usage

We'll be using the trial data set throughout this example. This set contains data from 200 patients randomized to a new adjuvant therapy or placebo. The outcome is a binary tumor response. Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") = "Treatment Randomization"). These labels are displayed in the output table by default. A data frame without labels will print variable names.

trt      Treatment Randomization
age      Age, yrs
marker   Marker Level, ng/mL
stage    T Stage
grade    Grade
response Tumor Response

library(dplyr)
library(knitr)
library(kableExtra)
library(gtsummary)

# printing trial data
head(trial) %>% kable()

The default output from fmt_table1() is meant to be publication ready. Let's start by creating a descriptive statistics table from the trial data set built into the gtsummary package. The fmt_table1() can take, minimally, a data set as the only input, and return descriptive statistics for each column in the data frame.

For brevity, keeping a subset of the variables in the trial data set.

trial2 =
  trial %>%
  select(trt, marker, stage)

fmt_table1(trial2)

If your output does not appear in a formatted table, it is likely due to a known issue in the knitr::kable() function. One way around the issue to to add styling from the kableExtra package.
fmt_table1(trial2) %>% as_tibble() %>% knitr::kable() %>% kableExtra::kable_styling()

This is a great table, but for trial data the summary statistics should be split by randomization group. While reporting p-values for a randomized trial isn't recommended, we'll do it here as an illustration. To compare two or more groups, include add_comparison() to the function call.

fmt_table1(trial2, by = "trt") %>% add_comparison()

Customize Table 1 Output

It's also possible to add information to fmt_table1() output. The code below calculates the standard table with summary statistics split by treatment randomization with the following modifications

Report 'mean (SD)' and 'n / N (\%)'
Use t-test instead of Wilcoxon rank-sum
Do not add row for number of missing observations
Round large p-values to two decimal place
Add column of q-values (p-values adjusted using FDR)
Add column reporting summary statistics for the cohort overall
Add column reporting N not missing for each variable
Add column with statistic labels
Modify header to include percentages in each group
Bold variable labels
Italicize variable levels

trial2 %>%
  # build base table 1
  fmt_table1(
    by = "trt",
    # change variable labels
    label = list(
      marker = "Pretreatment Marker Level, ng/mL",
      stage = "Clinical T Stage"
      ),
    # change statistics printed in table
    statistic = list(
      continuous = "{mean} ({sd})",
      categorical = "{n} / {N} ({p}%)"
    ),
    missing = "no"
  ) %>%
  # add p-values to table, perform t-test for the marker,
  # and round large pvalues to two decimal place
  add_comparison(
    test = list(marker = "t.test"),
    pvalue_fun = function(x) fmt_pvalue(x, digits = 2)
  ) %>%
  # add q-values (p-values adjusted for multiple testing)
  add_q(pvalue_fun = function(x) fmt_pvalue(x, digits = 2)) %>%
  # add overall column
  add_overall() %>%
  # add column with N
  add_n() %>%
  # add statistic labels
  add_stat_label() %>%
  # bold variable labels, italicize levels
  bold_labels() %>%
  italicize_levels() %>%
  # bold p-values under a given threshold (default 0.05)
  bold_p(t = 0.2) %>%
  # include percent in headers
  modify_header(
    stat_by = c("{level}", "N = {n} ({p}%)"),
    stat_overall = c("All Patients", "N = {N} (100%)")
  )

Each of the modification functions have additional options outlined in their respective help files.

Report Results Inline

Having a well formatted and reproducible table is a great! But we often need to report the results from a table in the text of an Rmarkdown report. Inline reporting has been made simple with inline_text().

Let's first create a basic Table 1.

tab1 = fmt_table1(trial2, by = "trt")
tab1

To report the median (IQR) of the marker levels in each group, use the following commands inline.

The median (IQR) marker level in the drug and placebo groups are `r inline_text(tab1, cell = "marker:Drug")` and `r inline_text(tab1, cell = "marker:Placebo")`, respectively.

Here's how the line will appear in your report.

The median (IQR) marker level in the drug and placebo groups are r inline_text(tab1, cell = "marker:Drug") and r inline_text(tab1, cell = "marker:Placebo"), respectively.

The cell argument indicates to inline_text() which statistic to display. Information regarding which statistic to display are separated by ":". The first term indicates the variable name and the last indicates the level of the by variable e.g. marker:Placebo would display the summary statistics for the variable marker among patients in the Placebo group. If you display a statistic from a categorical variable, include the desired level after the variable name, e.g. stage:T1:Drug.

`r inline_text(tab1, "stage:T1:Drug")` resolves to "r inline_text(tab1, "stage:T1:Drug")"

gtsummary + kableExtra

Need a data frame for any reason (e.g. if you want to get extra fancy with kableExtra)? Use generic function as_tibble to extract an easy-to-use data frame from any fmt_table1 object.

#get data frame from fmt_table1 object
tab1_df <- as_tibble(tab1)

If you want to customize anything with knitr::kable or kableExtra, you can use the above as_tibble along with the function indent_key which extracts the row numbers you want indented when knitting your table to HTML. (NOTE: Only load library(kableExtra) and use the below if knitting to HTML, this will not work with Word or PDF.) For more on customizing your tables with kableExtra check out the package's vignette on HTML output.

# knit pretty table
tab1 %>%
  bold_labels() %>% # bold labels in here if you want
  as_tibble() %>%
  kable(
    row.names = FALSE,
    caption = "Table 1: Summary of Patient and Clinical Variables"
  ) %>%
  # Below, using kableExtra functions to do things like change table style, add 
  # grouped column header, footnote, and indent variable categories
  kable_styling(
    bootstrap_options = c("striped", "condensed", "hover"), #popular bootstrap styles
    font_size = 16,
    full_width = FALSE
  ) %>%
  add_header_above(c(" " = 1, "Treatment assignment" = 2)) %>%
  footnote(
    general = "Isn't this footnote so nice?",
    number = c("You can also add numbered or lettered footnotes", "Which is great.")
  ) %>%
  add_indent(indent_key(tab1))

Under the Hood

When you print the output from the fmt_table1() function into the R console or into an Rmarkdown, there are default printing functions that are called in the background: print.fmt_table1() and knit_print.fmt_table1(). The true output from fmt_table1() is a named list, but when you print into the R console the interesting portions are displayed from the .$table1 data frame.

t = fmt_table1(trial2, by = "trt") %>% add_comparison()
ls(t)

There is additional information stored in the fmt_table1() output list.

table1 data frame with summary statistics
meta_data data frame that is one row per variable, and contains information about each variable in the object
by the by = variable name from the function call
call the fmt_table1 function call
call_list named list of each function called for the fmt_table1 object. the above example would have two elements in the list: fmt_table1 and add_comparison.
inputs Inputs from the function call. Not only is the call stored, but the values of the inputs as well. For example, you can access the data frame passed to fmt_table1().