suggested_dependent_pkgs <- c("dplyr") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = all(vapply( suggested_dependent_pkgs, requireNamespace, logical(1), quietly = TRUE )) )
knitr::opts_chunk$set(comment = "#")
```{css, echo=FALSE} .reveal .r code { white-space: pre; }
## Introduction In this vignette we create a * demographic table * adverse event table * response table * time-to-event analysis table using the `rtables` layout facility. That is, we demonstrate how the layout based tabulation framework can specify the structure and relations that are commonly found when analyzing clinical trials data. Note that all the data is created using random number generators. All `ex_*` data which is currently attached to the `rtables` package is provided by the [`formatters`](https://insightsengineering.github.io/formatters/) package and was created using the publicly available [`random.cdisc.data`](https://insightsengineering.github.io/random.cdisc.data/) R package. The packages used in this vignette are: ```r library(rtables) library(tibble) library(dplyr)
Demographic tables summarize the variables content for different population subsets (encoded in the columns).
One feature of analyze()
that we have not introduced in the previous
vignette is that the analysis function afun
can specify multiple
rows with the in_rows()
function:
ADSL <- ex_adsl # Example ADSL dataset lyt <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = "AGE", afun = function(x) { in_rows( "Mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"), "Range" = rcell(range(x), format = "xx.xx - xx.xx") ) }) tbl <- build_table(lyt, ADSL) tbl
Multiple variables can be analyzed in one analyze()
call:
lyt2 <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = c("AGE", "BMRKR1"), afun = function(x) { in_rows( "Mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"), "Range" = rcell(range(x), format = "xx.xx - xx.xx") ) }) tbl2 <- build_table(lyt2, ADSL) tbl2
Hence, if afun
can process different data vector types
(i.e. variables selected from the data) then we are fairly close to a
standard demographic table. Here is a function that either creates a
count table or some number summary if the argument x
is a factor or
numeric, respectively:
s_summary <- function(x) { if (is.numeric(x)) { in_rows( "n" = rcell(sum(!is.na(x)), format = "xx"), "Mean (sd)" = rcell(c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE)), format = "xx.xx (xx.xx)"), "IQR" = rcell(IQR(x, na.rm = TRUE), format = "xx.xx"), "min - max" = rcell(range(x, na.rm = TRUE), format = "xx.xx - xx.xx") ) } else if (is.factor(x)) { vs <- as.list(table(x)) do.call(in_rows, lapply(vs, rcell, format = "xx")) } else { stop("type not supported") } }
Note we use rcell
to wrap the results in order to add formatting
instructions for rtables
. We can use s_summary
outside the context
of tabulation:
s_summary(ADSL$AGE)
and
s_summary(ADSL$SEX)
We can now create a commonly used variant of the demographic table:
summary_lyt <- basic_table() %>% split_cols_by(var = "ARM") %>% analyze(c("AGE", "SEX"), afun = s_summary) summary_tbl <- build_table(summary_lyt, ADSL) summary_tbl
Note that analyze()
can also be called multiple times in sequence:
summary_lyt2 <- basic_table() %>% split_cols_by(var = "ARM") %>% analyze("AGE", s_summary) %>% analyze("SEX", s_summary) summary_tbl2 <- build_table(summary_lyt2, ADSL) summary_tbl2
which leads to the table identical to summary_tbl
:
identical(summary_tbl, summary_tbl2)
stopifnot(identical(summary_tbl, summary_tbl2))
In clinical trials analyses the number of patients per column is often
referred to as N
(rather than the overall population which outside
of clinical trials is commonly referred to as N
). Column N
s are
added by setting the show_colcounts
argument in basic_table()
to
TRUE
:
summary_lyt3 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARMCD") %>% analyze(c("AGE", "SEX"), s_summary) summary_tbl3 <- build_table(summary_lyt3, ADSL) summary_tbl3
We will now show a couple of variations of the demographic table that
we developed above. These variations are in structure and not in
analysis, hence they don't require a modification to the s_summary
function.
We will start with a standard table analyzing the variables AGE
and
BMRKR2
variables:
lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% analyze(c("AGE", "BMRKR2"), s_summary) tbl <- build_table(lyt, ADSL) tbl
Assume we would like to have this analysis carried out per gender encoded in the row space:
lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX") %>% analyze(c("AGE", "BMRKR2"), s_summary) tbl <- build_table(lyt, ADSL) tbl
We will now subset ADSL
to include only males and females in the
analysis in order to reduce the number of rows in the table:
ADSL_M_F <- filter(ADSL, SEX %in% c("M", "F")) lyt2 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX") %>% analyze(c("AGE", "BMRKR2"), s_summary) tbl2 <- build_table(lyt2, ADSL_M_F) tbl2
Note that the UNDIFFERENTIATED
and U
levels still show up in the
table. This is because tabulation respects the factor levels and level
order, exactly as the split
and table
function do. If empty levels
should be dropped then rtables
needs to know that at splitting time
via the split_fun
argument in split_rows_by()
. There are a number
of predefined functions. For this example drop_split_levels()
is
required to drop the empty levels at splitting time. Splitting is a
big topic and will be eventually addressed in a specific package
vignette.
lyt3 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX", split_fun = drop_split_levels, child_labels = "visible") %>% analyze(c("AGE", "BMRKR2"), s_summary) tbl3 <- build_table(lyt3, ADSL_M_F) tbl3
In the table above the labels M
and F
are not very
descriptive. You can add the full labels as follows:
ADSL_M_F_l <- ADSL_M_F %>% mutate(lbl_sex = case_when( SEX == "M" ~ "Male", SEX == "F" ~ "Female", SEX == "U" ~ "Unknown", SEX == "UNDIFFERENTIATED" ~ "Undifferentiated" )) lyt4 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "visible") %>% analyze(c("AGE", "BMRKR2"), s_summary) tbl4 <- build_table(lyt4, ADSL_M_F_l) tbl4
For the next table variation we only stratify by gender for the AGE
analysis. To do this the nested
argument has to be set to FALSE
in
analyze()
call:
lyt5 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "visible") %>% analyze("AGE", s_summary, show_labels = "visible") %>% analyze("BMRKR2", s_summary, nested = FALSE, show_labels = "visible") tbl5 <- build_table(lyt5, ADSL_M_F_l) tbl5
Once we split the rows into groups (Male
and Female
here) one
might want to summarize groups: usually by showing count and column
percentages. This is especially important if we have missing data. For
example, if we create the above table but add missing data to the
AGE
variable:
insert_NAs <- function(x) { x[sample(c(TRUE, FALSE), length(x), TRUE, prob = c(0.2, 0.8))] <- NA x } set.seed(1) ADSL_NA <- ADSL_M_F_l %>% mutate(AGE = insert_NAs(AGE)) lyt6 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by( "SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "visible" ) %>% analyze("AGE", s_summary) %>% analyze("BMRKR2", s_summary, nested = FALSE, show_labels = "visible") tbl6 <- build_table(lyt6, filter(ADSL_NA, SEX %in% c("M", "F"))) tbl6
Here it is not easy to see how many females and males there are in
each arm as n
represents the number of non-missing data elements in
the variables. Groups within rows that are defined by splitting can be
summarized with summarize_row_groups()
, for example:
lyt7 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels) %>% summarize_row_groups() %>% analyze("AGE", s_summary) %>% analyze("BMRKR2", afun = s_summary, nested = FALSE, show_labels = "visible") tbl7 <- build_table(lyt7, filter(ADSL_NA, SEX %in% c("M", "F"))) tbl7
There are a couple of things to note here:
summarize_row_groups()
).We can recreate this default behavior (count percentage) by defining a
cfun
for illustrative purposes here as it results in the same table
as above:
lyt8 <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels) %>% summarize_row_groups(cfun = function(df, labelstr, .N_col, ...) { in_rows( rcell(nrow(df) * c(1, 1 / .N_col), format = "xx (xx.xx%)"), .labels = labelstr ) }) %>% analyze("AGE", s_summary) %>% analyze("BEP01FL", afun = s_summary, nested = FALSE, show_labels = "visible") tbl8 <- build_table(lyt8, filter(ADSL_NA, SEX %in% c("M", "F"))) tbl8
Note that cfun
, like afun
(which is used in analyze()
), can
operate on either variables, passed via the x
argument, or
data.frame
s or tibble
s, which are passed via the df
argument
(afun
can optionally request df
too). Unlike afun
, cfun
must
accept labelstr
as the second argument which gives the default group
label (factor level from splitting) and hence it could be modified:
lyt9 <- basic_table() %>% split_cols_by(var = "ARM") %>% split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "hidden") %>% summarize_row_groups(cfun = function(df, labelstr, .N_col, ...) { in_rows( rcell(nrow(df) * c(1, 1 / .N_col), format = "xx (xx.xx%)"), .labels = paste0(labelstr, ": count (perc.)") ) }) %>% analyze("AGE", s_summary) %>% analyze("BEP01FL", s_summary, nested = FALSE, show_labels = "visible") tbl9 <- build_table(lyt9, filter(ADSL_NA, SEX %in% c("M", "F"))) tbl9
Layouts have a couple of advantages over tabulating the tables directly:
Here is an example that demonstrates the reusability of layouts:
adsl_lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% analyze(c("AGE", "SEX"), afun = s_summary) adsl_lyt
We can now build a table for ADSL
adsl_tbl <- build_table(adsl_lyt, ADSL) adsl_tbl
or for all patients that are older than 18:
adsl_f_tbl <- build_table(lyt, ADSL %>% filter(AGE > 18)) adsl_f_tbl
There are a number of different adverse event tables. We will now present two tables that show adverse events by ID and then by grade and by ID.
This time we won't use the ADAE
dataset from
random.cdisc.data
but rather generate a dataset on the fly (see Adrian's 2016 Phuse
paper):
set.seed(1) lookup <- tribble( ~AEDECOD, ~AEBODSYS, ~AETOXGR, "HEADACHE", "NERVOUS SYSTEM DISORDERS", "5", "BACK PAIN", "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS", "2", "GINGIVAL BLEEDING", "GASTROINTESTINAL DISORDERS", "1", "HYPOTENSION", "VASCULAR DISORDERS", "3", "FAECES SOFT", "GASTROINTESTINAL DISORDERS", "2", "ABDOMINAL DISCOMFORT", "GASTROINTESTINAL DISORDERS", "1", "DIARRHEA", "GASTROINTESTINAL DISORDERS", "1", "ABDOMINAL FULLNESS DUE TO GAS", "GASTROINTESTINAL DISORDERS", "1", "NAUSEA (INTERMITTENT)", "GASTROINTESTINAL DISORDERS", "2", "WEAKNESS", "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS", "3", "ORTHOSTATIC HYPOTENSION", "VASCULAR DISORDERS", "4" ) normalize <- function(x) x / sum(x) weightsA <- normalize(c(0.1, dlnorm(seq(0, 5, length.out = 25), meanlog = 3))) weightsB <- normalize(c(0.2, dlnorm(seq(0, 5, length.out = 25)))) N_pop <- 300 ADSL2 <- data.frame( USUBJID = seq(1, N_pop, by = 1), ARM = sample(c("ARM A", "ARM B"), N_pop, TRUE), SEX = sample(c("F", "M"), N_pop, TRUE), AGE = 20 + rbinom(N_pop, size = 40, prob = 0.7) ) l.adae <- mapply( ADSL2$USUBJID, ADSL2$ARM, ADSL2$SEX, ADSL2$AGE, FUN = function(id, arm, sex, age) { n_ae <- sample(0:25, 1, prob = if (arm == "ARM A") weightsA else weightsB) i <- sample(seq_len(nrow(lookup)), size = n_ae, replace = TRUE, prob = c(6, rep(1, 10)) / 16) lookup[i, ] %>% mutate( AESEQ = seq_len(n()), USUBJID = id, ARM = arm, SEX = sex, AGE = age ) }, SIMPLIFY = FALSE ) ADAE2 <- do.call(rbind, l.adae) ADAE2 <- ADAE2 %>% mutate( ARM = factor(ARM, levels = c("ARM A", "ARM B")), AEDECOD = as.factor(AEDECOD), AEBODSYS = as.factor(AEBODSYS), AETOXGR = factor(AETOXGR, levels = as.character(1:5)) ) %>% select(USUBJID, ARM, AGE, SEX, AESEQ, AEDECOD, AEBODSYS, AETOXGR) ADAE2
We start by defining an events summary function:
s_events_patients <- function(x, labelstr, .N_col) { in_rows( "Total number of patients with at least one event" = rcell(length(unique(x)) * c(1, 1 / .N_col), format = "xx (xx.xx%)"), "Total number of events" = rcell(length(x), format = "xx") ) }
So, for a population of 5
patients where
AE
sAE
AE
swe would get the following summary:
s_events_patients(x = c("id 1", "id 1", "id 2"), .N_col = 5)
The .N_col
argument is a special keyword argument by which
build_table()
passes the population size for each respective
column. For a list of keyword arguments for the functions passed to
afun
in analyze()
, refer to the documentation with ?analyze
.
We now use the s_events_patients
summary function in a tabulation:
adae_lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% analyze("USUBJID", s_events_patients) adae_tbl <- build_table(adae_lyt, ADAE2) adae_tbl
Note that the column N
s are wrong as by default they are set to the
number of rows per group (i.e. number of AE
s per arm here). This also
affects the percentages. For this table we are interested in the
number of patients per column/arm which is usually taken from ADSL
(var ADSL2
here).
rtables
handles this by allowing us to override how the column
counts are computed. We can specify an alt_counts_df
in
build_table()
. When we do this, rtables
calculates the column counts
by applying the same column faceting to alt_counts_df
as it does to
the primary data during tabulation:
adae_adsl_tbl <- build_table(adae_lyt, ADAE2, alt_counts_df = ADSL2) adae_adsl_tbl
Alternatively, if the desired column counts are already calculated,
they can be specified directly via the col_counts
argument to
build_table()
, though specifying an alt_counts_df
is the preferred
mechanism (the number of rows will be used, but no duplicate checking!!!).
We next calculate this information per system organ class:
adae_soc_lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% analyze("USUBJID", s_events_patients) %>% split_rows_by("AEBODSYS", child_labels = "visible", nested = FALSE) %>% summarize_row_groups("USUBJID", cfun = s_events_patients) adae_soc_tbl <- build_table(adae_soc_lyt, ADAE2, alt_counts_df = ADSL2) adae_soc_tbl
We now have to add a count table of AEDECOD
for each AEBODSYS
. The
default analyze()
behavior for a factor is to create the count table
per level (using rtab_inner
):
adae_soc_lyt2 <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1) %>% summarize_row_groups("USUBJID", cfun = s_events_patients) %>% analyze("AEDECOD", indent_mod = -1) adae_soc_tbl2 <- build_table(adae_soc_lyt2, ADAE2, alt_counts_df = ADSL2) adae_soc_tbl2
The indent_mod
argument enables relative indenting changes if the
tree structure of the table does not result in the desired indentation
by default.
This table so far is however not the usual adverse event table as it counts the total number of events and not the number of subjects for one or more events for a particular term. To get the correct table we need to write a custom analysis function:
table_count_once_per_id <- function(df, termvar = "AEDECOD", idvar = "USUBJID") { x <- df[[termvar]] id <- df[[idvar]] counts <- table(x[!duplicated(id)]) in_rows( .list = as.vector(counts), .labels = names(counts) ) } table_count_once_per_id(ADAE2)
So the desired AE
table is:
adae_soc_lyt3 <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1) %>% summarize_row_groups("USUBJID", cfun = s_events_patients) %>% analyze("AEDECOD", afun = table_count_once_per_id, show_labels = "hidden", indent_mod = -1) adae_soc_tbl3 <- build_table(adae_soc_lyt3, ADAE2, alt_counts_df = ADSL2) adae_soc_tbl3
Note that we are missing the overall summary in the first two
rows. This can be added with an initial analyze()
call.
adae_soc_lyt4 <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% analyze("USUBJID", afun = s_events_patients) %>% split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1, section_div = "") %>% summarize_row_groups("USUBJID", cfun = s_events_patients) %>% analyze("AEDECOD", table_count_once_per_id, show_labels = "hidden", indent_mod = -1) adae_soc_tbl4 <- build_table(adae_soc_lyt4, ADAE2, alt_counts_df = ADSL2) adae_soc_tbl4
Finally, if we wanted to prune the 0 count rows we can do that with the trim_rows()
function:
trim_rows(adae_soc_tbl4)
Pruning is a larger topic with a separate rtables
package
vignette.
The adverse events table by ID and by grade shows how many patients had at least one adverse event per grade for different subsets of the data (e.g. defined by system organ class).
For this table we do not show the zero count grades. Note that we add the "overall" groups with a custom split function.
table_count_grade_once_per_id <- function(df, labelstr = "", gradevar = "AETOXGR", idvar = "USUBJID", grade_levels = NULL) { id <- df[[idvar]] grade <- df[[gradevar]] if (!is.null(grade_levels)) { stopifnot(all(grade %in% grade_levels)) grade <- factor(grade, levels = grade_levels) } id_sel <- !duplicated(id) in_rows( "--Any Grade--" = sum(id_sel), .list = as.list(table(grade[id_sel])) ) } table_count_grade_once_per_id(ex_adae, grade_levels = 1:5)
All of the layouting concepts needed to create this table have already been introduced so far:
adae_grade_lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARM") %>% analyze( "AETOXGR", afun = table_count_grade_once_per_id, extra_args = list(grade_levels = 1:5), var_labels = "- Any adverse events -", show_labels = "visible" ) %>% split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1) %>% summarize_row_groups(cfun = table_count_grade_once_per_id, format = "xx", indent_mod = 1) %>% split_rows_by("AEDECOD", child_labels = "visible", indent_mod = -2) %>% analyze( "AETOXGR", afun = table_count_grade_once_per_id, extra_args = list(grade_levels = 1:5), show_labels = "hidden" ) adae_grade_tbl <- build_table(adae_grade_lyt, ADAE2, alt_counts_df = ADSL2) adae_grade_tbl
The response table that we will create here is composed of 3 parts:
Let's start with the first part which is fairly simple to derive:
ADRS_BESRSPI <- ex_adrs %>% filter(PARAMCD == "BESRSPI") %>% mutate( rsp = factor(AVALC %in% c("CR", "PR"), levels = c(TRUE, FALSE), labels = c("Responders", "Non-Responders")), is_rsp = (rsp == "Responders") ) s_proportion <- function(x, .N_col) { in_rows( .list = lapply( as.list(table(x)), function(xi) rcell(xi * c(1, 1 / .N_col), format = "xx.xx (xx.xx%)") ) ) } rsp_lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARMCD", ref_group = "ARM A") %>% analyze("rsp", s_proportion, show_labels = "hidden") rsp_tbl <- build_table(rsp_lyt, ADRS_BESRSPI) rsp_tbl
Note that we did set the ref_group
argument in split_cols_by()
which for the current table had no effect as we only use the cell data
for the responder and non-responder counts. The ref_group
argument
is needed for the part 2 and 3 of the table.
We will now look the implementation of part 2: unstratified analysis comparison vs. control group. Let's start with the analysis function:
s_unstrat_resp <- function(x, .ref_group, .in_ref_col) { if (.in_ref_col) { return(in_rows( "Difference in Response Rates (%)" = rcell(numeric(0)), "95% CI (Wald, with correction)" = rcell(numeric(0)), "p-value (Chi-Squared Test)" = rcell(numeric(0)), "Odds Ratio (95% CI)" = rcell(numeric(0)) )) } fit <- stats::prop.test( x = c(sum(x), sum(.ref_group)), n = c(length(x), length(.ref_group)), correct = FALSE ) fit_glm <- stats::glm( formula = rsp ~ group, data = data.frame( rsp = c(.ref_group, x), group = factor(rep(c("ref", "x"), times = c(length(.ref_group), length(x))), levels = c("ref", "x")) ), family = binomial(link = "logit") ) in_rows( "Difference in Response Rates (%)" = non_ref_rcell( (mean(x) - mean(.ref_group)) * 100, .in_ref_col, format = "xx.xx" ), "95% CI (Wald, with correction)" = non_ref_rcell( fit$conf.int * 100, .in_ref_col, format = "(xx.xx, xx.xx)" ), "p-value (Chi-Squared Test)" = non_ref_rcell( fit$p.value, .in_ref_col, format = "x.xxxx | (<0.0001)" ), "Odds Ratio (95% CI)" = non_ref_rcell( c( exp(stats::coef(fit_glm)[-1]), exp(stats::confint.default(fit_glm, level = .95)[-1, , drop = FALSE]) ), .in_ref_col, format = "xx.xx (xx.xx - xx.xx)" ) ) } s_unstrat_resp( x = ADRS_BESRSPI %>% filter(ARM == "A: Drug X") %>% pull(is_rsp), .ref_group = ADRS_BESRSPI %>% filter(ARM == "B: Placebo") %>% pull(is_rsp), .in_ref_col = FALSE )
Hence we can now add the next vignette to the table:
rsp_lyt2 <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARMCD", ref_group = "ARM A") %>% analyze("rsp", s_proportion, show_labels = "hidden") %>% analyze( "is_rsp", s_unstrat_resp, show_labels = "visible", var_labels = "Unstratified Response Analysis" ) rsp_tbl2 <- build_table(rsp_lyt2, ADRS_BESRSPI) rsp_tbl2
Next we will add part 3: the multinomial response table. To do so, we are adding a row-split by response level, and then doing the same thing as we did for the binary response table above.
s_prop <- function(df, .N_col) { in_rows( "95% CI (Wald, with correction)" = rcell(binom.test(nrow(df), .N_col)$conf.int * 100, format = "(xx.xx, xx.xx)") ) } s_prop( df = ADRS_BESRSPI %>% filter(ARM == "A: Drug X", AVALC == "CR"), .N_col = sum(ADRS_BESRSPI$ARM == "A: Drug X") )
We can now create the final response table with all three parts:
rsp_lyt3 <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARMCD", ref_group = "ARM A") %>% analyze("rsp", s_proportion, show_labels = "hidden") %>% analyze( "is_rsp", s_unstrat_resp, show_labels = "visible", var_labels = "Unstratified Response Analysis" ) %>% split_rows_by( var = "AVALC", split_fun = reorder_split_levels(neworder = c("CR", "PR", "SD", "PD", "NE"), drlevels = TRUE), nested = FALSE ) %>% summarize_row_groups() %>% analyze("AVALC", afun = s_prop) rsp_tbl3 <- build_table(rsp_lyt3, ADRS_BESRSPI) rsp_tbl3
In the case that we wanted to rename the levels of AVALC
and remove
the CI for NE
we could do that as follows:
rsp_label <- function(x) { rsp_full_label <- c( CR = "Complete Response (CR)", PR = "Partial Response (PR)", SD = "Stable Disease (SD)", `NON CR/PD` = "Non-CR or Non-PD (NON CR/PD)", PD = "Progressive Disease (PD)", NE = "Not Evaluable (NE)", Missing = "Missing", `NE/Missing` = "Missing or unevaluable" ) stopifnot(all(x %in% names(rsp_full_label))) rsp_full_label[x] } rsp_lyt4 <- basic_table(show_colcounts = TRUE) %>% split_cols_by("ARMCD", ref_group = "ARM A") %>% analyze("rsp", s_proportion, show_labels = "hidden") %>% analyze( "is_rsp", s_unstrat_resp, show_labels = "visible", var_labels = "Unstratified Response Analysis" ) %>% split_rows_by( var = "AVALC", split_fun = keep_split_levels(c("CR", "PR", "SD", "PD"), reorder = TRUE), nested = FALSE ) %>% summarize_row_groups(cfun = function(df, labelstr, .N_col) { in_rows(nrow(df) * c(1, 1 / .N_col), .formats = "xx (xx.xx%)", .labels = rsp_label(labelstr)) }) %>% analyze("AVALC", afun = s_prop) %>% analyze("AVALC", afun = function(x, .N_col) { in_rows(rcell(sum(x == "NE") * c(1, 1 / .N_col), format = "xx.xx (xx.xx%)"), .labels = rsp_label("NE")) }, nested = FALSE) rsp_tbl4 <- build_table(rsp_lyt4, ADRS_BESRSPI) rsp_tbl4
Note that the table is missing the rows gaps to make it more
readable. The row spacing feature is on the rtables
roadmap and will
be implemented in future.
The time to event analysis table that will be constructed consists of four parts:
The table is constructed by sequential use of the analyze()
function, with four custom analysis functions corresponding to each of
the four parts listed above. In addition the table includes
referential footnotes relevant to the table contents. The table will
be faceted column-wise by arm.
First we will start by loading the necessary packages and preparing the data to be used in the construction of this table.
library(survival) adtte <- ex_adaette %>% dplyr::filter(PARAMCD == "AETTE2", SAFFL == "Y") # Add censoring to data for example adtte[adtte$AVAL > 1.0, ] <- adtte[adtte$AVAL > 1.0, ] %>% mutate(AVAL = 1.0, CNSR = 1) adtte2 <- adtte %>% mutate(CNSDTDSC = ifelse(CNSDTDSC == "", "__none__", CNSDTDSC))
The adtte
dataset will be used in preparing the models while the
adtte2
dataset handles missing values in the "Censor Date
Description" column and will be used to produce the final table. We
add censoring into the data for example purposes.
Next we create a basic analysis function, a_count_subjs
which prints
the overall unique subject counts and percentages within the data.
a_count_subjs <- function(x, .N_col) { in_rows( "Subjects with Adverse Events n (%)" = rcell(length(unique(x)) * c(1, 1 / .N_col), format = "xx (xx.xx%)") ) }
Then an analysis function is created to generate the counts of
censored subjects for each level of a factor variable in the
dataset. In this case the cnsr_counter
function will be applied with
the CNSDTDSC
variable which contains a censor date description for
each censored subject.
cnsr_counter <- function(df, .var, .N_col) { x <- df[!duplicated(df$USUBJID), .var] x <- x[x != "__none__"] lapply(table(x), function(xi) rcell(xi * c(1, 1 / .N_col), format = "xx (xx.xx%)")) }
This function generates counts and fractions of unique subjects corresponding to each factor level, excluding missing values (uncensored patients).
A Cox proportional-hazards (Cox P-H) analysis is generated next with a
third custom analysis function, a_cph
. Prior to creating the
analysis function, the Cox P-H model is fit to our data using the
coxph()
and Surv()
functions from the survival
package. Then
this model is used as input to the a_cph
analysis function which
returns hazard ratios, 95% confidence intervals, and p-values
comparing against the reference group - in this case the leftmost
column.
cph <- coxph(Surv(AVAL, CNSR == 0) ~ ACTARM + STRATA1, ties = "exact", data = adtte) a_cph <- function(df, .var, .in_ref_col, .ref_full, full_cox_fit) { if (.in_ref_col) { ret <- replicate(3, list(rcell(NULL))) } else { curtrt <- df[[.var]][1] coefs <- coef(full_cox_fit) sel_pos <- grep(curtrt, names(coefs), fixed = TRUE) hrval <- exp(coefs[sel_pos]) sdf <- survdiff(Surv(AVAL, CNSR == 0) ~ ACTARM + STRATA1, data = rbind(df, .ref_full)) pval <- (1 - pchisq(sdf$chisq, length(sdf$n) - 1)) / 2 ci_val <- exp(unlist(confint(full_cox_fit)[sel_pos, ])) ret <- list( rcell(hrval, format = "xx.x"), rcell(ci_val, format = "(xx.x, xx.x)"), rcell(pval, format = "x.xxxx | (<0.0001)") ) } in_rows( .list = ret, .names = c("Hazard ratio", "95% confidence interval", "p-value (one-sided stratified log rank)") ) }
The fourth and final analysis function, a_tte
, generates a time to
first adverse event table with three rows corresponding to Median, 95%
Confidence Interval, and Min Max respectively. First a survival table
is constructed from the summary table of a survival model using the
survfit()
and Surv()
functions from the survival
package. This
table is then given as input to a_tte
which produces the table of
time to first adverse event consisting of the previously mentioned
summary statistics.
surv_tbl <- as.data.frame( summary(survfit(Surv(AVAL, CNSR == 0) ~ ACTARM, data = adtte, conf.type = "log-log"))$table ) %>% dplyr::mutate( ACTARM = factor(gsub("ACTARM=", "", row.names(.)), levels = levels(adtte$ACTARM)), ind = FALSE ) a_tte <- function(df, .var, kp_table) { ind <- grep(df[[.var]][1], row.names(kp_table), fixed = TRUE) minmax <- range(df[["AVAL"]]) mm_val_str <- format_value(minmax, format = "xx.x, xx.x") rowfn <- list() if (all(df$CNSR[df$AVAL == minmax[2]])) { mm_val_str <- paste0(mm_val_str, "*") rowfn <- "* indicates censoring" } in_rows( Median = kp_table[ind, "median", drop = TRUE], "95% confidence interval" = unlist(kp_table[ind, c("0.95LCL", "0.95UCL")]), "Min Max" = mm_val_str, .formats = c("xx.xx", "xx.xx - xx.xx", "xx"), .row_footnotes = list(NULL, NULL, rowfn) ) }
Additionally, the a_tte
function creates a referential footnote
within the table to indicate where censoring occurred in the data.
Now we are able to use these four analysis functions to build our time to event analysis table.
lyt <- basic_table(show_colcounts = TRUE) %>% ## Column faceting split_cols_by("ARM", ref_group = "A: Drug X") %>% ## Overall count analyze("USUBJID", a_count_subjs, show_labels = "hidden") %>% ## Censored subjects summary analyze("CNSDTDSC", cnsr_counter, var_labels = "Censored Subjects", show_labels = "visible") %>% ## Cox P-H analysis analyze("ARM", a_cph, extra_args = list(full_cox_fit = cph), show_labels = "hidden") %>% ## Time-to-event analysis analyze( "ARM", a_tte, var_labels = "Time to first adverse event", show_labels = "visible", extra_args = list(kp_table = surv_tbl), table_names = "kapmeier" ) tbl_tte <- build_table(lyt, adtte2)
We set the show_colcounts
argument of basic_table()
to TRUE
to
first print the total subject counts for each column. Next we use
split_cols_by()
to split the table into three columns corresponding
to the three different levels of ARM
, and specify that the first
arm, "A: Drug X"
should act as the reference group to be compared
against - this reference group is used for the Cox P-H analysis. Then
we call analyze()
sequentially using each of the four custom
analysis functions as argument afun
and specifying additional
arguments where necessary. Then we use build_table()
to construct
our rtable
using the adtte2
dataset.
Finally, we annotate the table using the fnotes_at_path()
function
to specify that product-limit estimates are used to calculate the
statistics listed under the "Time to first adverse event" heading
within the table. The referential footnote created earlier in the
time-to-event analysis function (a_tte
) is also displayed.
fnotes_at_path( tbl_tte, c("ma_USUBJID_CNSDTDSC_ARM_kapmeier", "kapmeier") ) <- "Product-limit (Kaplan-Meier) estimates." tbl_tte
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.