suggested_dependent_pkgs <- c("dplyr") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = all(vapply( suggested_dependent_pkgs, requireNamespace, logical(1), quietly = TRUE )) )
suppressPackageStartupMessages(library(rtables)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(tibble)) ## XXX put this somewhere else so everyone can share it fixed_shell <- function(tt) { mystr <- table_shell_str(tt) regex_hits <- gregexpr("[(]N=[[:digit:]]+[)]", mystr)[[1]] hit_lens <- attr(regex_hits, "match.length") if (regex_hits[1] > 0) { for (i in seq_along(regex_hits)) { start <- regex_hits[i] len <- hit_lens[i] substr(mystr, start, start + len - 1) <- padstr("(N=xx)", len, just = "center") } } cat(mystr) } knitr::opts_chunk$set(comment = "")
rtables supports generalized faceting when declaring row and
column structure. In particular it, allows faceting behavior to
deviate from that seen in e.g., ggplot2 faceting support in four
crucial ways often required for tables:
While this flexibility provides a cornerstone to rtables' power -
alongside the flexibility of analysis functions discussed in the
previous chapter - it also means we must actively think about faceting
when creating table layouts in a way simply not required of users of
facet_grid in ggplot2.
In this chapter we will cover identifying which aspects of a shell or desired table should be achieved by specifying the correct split function(s) in the layout. As with the previous chapter's handling of analysis behavior, we will leave implementation of fully custom split functions for the advanced portion of this guide and focus solely on the identification of required behavior to prepare users to choose between a selection of pre-existing non-default split functions available to them.
Faceting serves three purposes within the rtables layouting
framework. It declares
In particular, (3) means that the data passed to analysis functions is the intersection of the data associated with the row- and column-facets that define the location of the cell(s) whose contents are being calculated.
rtables is designed such that data should not need to be duplicated,
nor .e.g, levels of a factor, restricted in the dataset prior to
calling build_table. Things like adding combination levels and
restricting or reordering factor levels are all declared via faceting
in the layout and then performed automatically by the internal
rtables machinery during table creation.
We will leave a detailed technical discussion of how split functions work for when we implement our own custom split functions in the advanced portion of this guide. For our purposes here, it suffices to consider a split function to be a mapping from an incoming dataset (the data associated with the parent facet) to a set of one or more facets, each of which are associated with (sub)sets of that incoming data.
By default, faceting instructions:
The above behaviors combine to mean that sequential faceting
instructions (i.e., repeated calls to split_cols_by or
split_rows_by) result in full factorial faceting, where each
combination of levels from the variables faceted on is represented.
This is true with column faceting:
lyt <- basic_table() |> split_cols_by("ARM") |> split_cols_by("SEX") build_table(lyt, ex_adsl)
as well as with row faceting, with the caveat that row faceting does not generate individual rows, and thus an analyze call is required:
lyt2 <- basic_table() |> split_rows_by("STRATA1") |> split_rows_by("BMRKR2") |> analyze("AGE") build_table(lyt2, ex_adsl)
Any time we need faceting that does not represent a full factorial combination of one or more variables (i.e., the full set of combinations of levels from those variables), we will need to use split functions to declare our desired structure.
The key, then, is to carefully consider how our desired faceting structure deviates from the full factorial structure that default faceting would generate. This will tell us what behaviors we need from our split functions.
The simplest deviation from full-factorial faceting is to omit some levels when faceting based on a single categorical variable. This can come in two flavors:
Prescriptively omitting levels(/facets) is fairly straightforward: you
have a set of levels that, for whatever reason, you do not want facets
for in the resulting table. rtables provides the
remove_split_levels to create split functions which achieve this.
Empirically omitting levels(/facets) is more open ended, as
technically the logic determining what should be omitted can be
completely arbitrary. The most common version, however, is to omit
unobserved levels (which would result in facets whose associated data
subset is empty); the drop_split_levels split function does this.
We will use a slightly modified version of our synthetic data to illustrate the difference:
adsl <- subset(ex_adsl, as.character(SEX) %in% c("F", "M", "U")) qtable(adsl, col_vars = "SEX")
First we declare faceting that omits the (rare but observed) "U"
level using remove_split_levels.
lyt_pre <- basic_table() |> split_cols_by("SEX", split_fun = remove_split_levels("U")) |> analyze("STRATA1") build_table(lyt_pre, adsl)
Next we will use drop_split_levels:
lyt_emp <- basic_table() |> split_cols_by("SEX", split_fun = drop_split_levels) |> analyze("STRATA1") build_table(lyt_emp, adsl)
Here we get exactly -- and only -- facets for the levels of SEX
observed in the data.
It is important to note that drop_split_levels omits facets for
levels not observed in the incoming data which is the data for
the parent facet. This only translates to the full data being
tabulated in cases of top level faceting (not nested within anything)
and other special cases.
We can see this if we nest faceting using the empirical
drop_split_levels within another faceting instruction:
lyt_bad_emp <- basic_table() |> split_cols_by("ARM") |> split_rows_by("RACE", split_fun = drop_split_levels) |> split_rows_by("SEX", split_fun = drop_split_levels) |> analyze("AGE") build_table(lyt_bad_emp, adsl)
Here we see that different sets of SEX facets are generated within
different RACE facets, with the "MULTIPLE" and "NATIVE HAWAIIAN
OR OTHER PACIFIC ISLANDER" races each having only a (different)
single facet. This is sometimes the desired behavior, but often it is
not so care should be used with drop_split_levels in non-trivial
faceting structures.
Some shells call for levels to be combined into new virtual
levels. For example, we might need an "All Drug X" category in our
table which represents both arms A ("A: Drug X") and C ("C:
Combination"`) as a single group of patients, either in addition to or
instead of those individual arms.
As with omitting defined factor levels, this is a deviation from the default full factorial behavior. In this case we want a facet for a level not present in the data and (assuming the individual arms are left in alongside our combination arm) our desired facets are not mutually exclusive.
rtables provides the add_combo_levels split function to directly
invoke this behavior. It takes a "combination data.frame" that
declares the combination levels to add.
combodf <- tribble( ~valname, ~label, ~levelcombo, ~exargs, "A_C", "Arms A+C", c("A: Drug X", "C: Combination"), list() ) lyt_combo1 <- basic_table() |> split_cols_by("ARM", split_fun = add_combo_levels(combodf), show_colcounts = TRUE) build_table(lyt_combo1, ex_adsl)
Often times when performing nested faceting, the inner variable represents the same information as the outer variable in more detail. Another way to view this is that the information represented by the outer variable is implicitly included (or embedded) within the information for the inner variable. When this occurs, most combinations of levels from the pair of variables are not logically consistent, can never occur in practice, and most importantly, should not be represented in our resulting table. Whenever this is the case, we cannot rely on the default splitting behavior.
An ubiquitous example of this in clinical trials are the System Organ
Class (AESOC) and Preferred Term (AEDECOD) variables used when
describing adverse events. AESOC represents the broad category an
adverse events falls within (e.g., "SKELETOMUSCULAR" or
"GASTROINTESTINAL") while AEDECOD represents the specific type of
adverse-event ("BACK PAIN", "VOMITING"). In this example, the
combination of AESOC being "SKELETOMUSCULAR" while AEDECOD is
"VOMITING". In our alternate framing we would say that the AEDECOD
value "VOMITING" implies that AESOC must be "SKELETOMUSCULAR".
Note that our synthetic data does not contain realistic values for
AESOC and AEDECOD, but rather values of the form "cl X" (with X
a capital letter) and "dcd X.m.n.o.p" with m-p individual digits,
respectively. Note this makes the information embedding even more
explicit, as the X is the same between values of AESOC and the
values of AEDECOD they apply to.
As with omitting facets within a single faceting instruction, there are broadly two ways to approach this type of nested faceting:
In both cases, we can think about this in terms of pairs of levels we want to represent in our table. The goal here is to preemptively omit pairs which are not logically consistent (and thus which we can assume have no observations in the data).
The empirical approach assumes that either:
To this end, rtables provides the trim_levels_in_group split
function factory, which, for each observed level in variable being
split, levels of a declared inner_var are restricted to those
observed in combination to that level of the split variable. When we
then split on or analyze the inner variable, we get a table that contains only
the observed pairs:
lyt_tig <- basic_table() |> split_rows_by("AESOC", split_fun = trim_levels_in_group("AEDECOD")) |> analyze("AEDECOD") build_table(lyt_tig, ex_adae)
trim_levels_in_group can be used in chains to further restrict the
displayed combinations of more than two variables, if desired:
lyt_tig2 <- basic_table(title = "Observed Toxicity Grades") |> split_rows_by("AESOC", split_fun = trim_levels_in_group("AEDECOD")) |> split_rows_by("AEDECOD", split_fun = trim_levels_in_group("AETOXGR")) |> analyze("AETOXGR") build_table(lyt_tig2, ex_adae)
Sometimes the above is the desired behavior; many times, however, there are certain counts or values which are important to display even when they are not observed. In such cases, we still want to omit pairs of levels that are impossible/logically inconsistent, but cannot rely on which combinations are observed in the data.
In such cases, we must prescriptively declare which combinations we
want to appear in our table. rtables provides the
trim_levels_to_map split function factory for this, which accepts a
pre-defined map of all combinations which should be included (in the
form of a data.frame). Any combinations which do not appear in the map
will be omitted even if they are observed in the data.
map <- tribble( ~AESOC, ~AEDECOD, "cl A", "dcd A.1.1.1.2", "cl B", "dcd B.1.1.1.1", "cl B", "dcd B.2.2.3.1", "cl D", "dcd D.1.1.1.1" ) lyt_ttm <- basic_table() |> split_rows_by("AESOC", split_fun = trim_levels_to_map(map)) |> analyze("AEDECOD") build_table(lyt_ttm, ex_adae)
Note that because there were no pairs in the map with an AESOC of
"cl C", that entire facet is omitted. This will be true in the case
of nested faceting as well:
lyt_ttm2 <- basic_table() |> split_rows_by("AESOC", split_fun = trim_levels_to_map(map)) |> split_rows_by("AEDECOD", split_fun = trim_levels_in_group("AETOXGR")) |> analyze("AETOXGR") build_table(lyt_ttm2, ex_adae)
In our examples so far, faceting has translated to mapping the
incoming data to a set of distinct (if not necessarily mutually
exclusive or exhaustive) subsets of the data. This is the most common
form of faceting, but it is not the only one rtables supports.
In some cases, we want facets to be semantically distinct from each other; in other words, instead of representing different subsets of the data, we want them to represent different aspects of the same data. This is most commonly useful column space, where individual columns are defined via faceting, unlike individual rows.
An toy example of this would be
library(tibble) tpose_afun <- function(x, .var, .spl_context) { spldf <<- .spl_context mycol <- tail(tail(.spl_context$cur_col_split_val, 1)[[1]], 1) cell <- switch(mycol, n = rcell(length(x), format = "xx"), mean = rcell(mean(x, na.rm = TRUE), format = "xx.x"), sd = rcell(sd(x, na.rm = TRUE), format = "xx.xx") ) in_rows(.list = setNames(list(cell), .var)) } combo_df <- tribble( ~valname, ~label, ~levelcombo, ~exargs, "n", "n", select_all_levels, list(), "mean", "mean", select_all_levels, list(), "sd", "sd", select_all_levels, list() ) lyt_sem_cols <- basic_table() |> split_cols_by("ARM") |> split_cols_by("STUDYID", split_fun = add_combo_levels(combo_df, keep_levels = combo_df$valname)) |> split_rows_by("SEX", split_fun = keep_split_levels(c("F", "M"))) |> analyze(c("AGE", "BMRKR1"), afun = tpose_afun, show_labels = "hidden") fixed_shell(build_table(lyt_sem_cols, ex_adsl))
Here we have individual columns for different statistics calculated
using the same data (n, mean and sd), within a faceting
structure that splits on arm in column space and gender in row space,
and calculated for two different continuous numeric variables (age and
"biomarker 1" value).
To achieve this, we need faceting that creates three columns all of
whose "subsets" of the incoming (arm) data are identical: all of
it. We can achieve this with the add_combo_levels split function
factory we used above; the key is to use the select_all_levels
sentinel value provided by rtables to indicate that all levels in the
data should be combined when creating each of our new combination
levels.
We will turn on column counts at all levels to show that it is doing what we want, despite it being redundant and not suitable for any actual table output.
my_combo_df <- tribble( ~valname, ~label, ~levelcombo, ~exargs, "n", "n", select_all_levels, list(), "mean", "mean", select_all_levels, list(), "sd", "sd", select_all_levels, list() ) lyt_tpose_cols_only <- basic_table() |> split_cols_by("ARM", show_colcounts = TRUE) |> split_cols_by("STUDYID", split_fun = add_combo_levels(my_combo_df, keep_levels = combo_df$valname), show_colcounts = TRUE ) build_table(lyt_tpose_cols_only, ex_adsl)
We split on study id in the above code largely for convenience. Given
that we are defining combination levels using select_all_levels, we
could split on anything and have each of the facets represent the
entirety of the incoming data. This approach, however, is a
generalization of splitting on study id in order to create a single
facet representing all the incoming data, a trick worth having in our
back pocket.
Thus we've achieved the column structure we wanted. Now we need an analysis function with the correct column-conditional behavior (see the previous chapter) and we will have our output.
Without discussing how we construct it (as that will be covered in the
advanced portion of this guide), assuming we have a tpose_afun which
meets our requirements, we can then fully create our table:
lyt_tpose_full <- basic_table() |> split_cols_by("ARM", show_colcounts = TRUE) |> split_cols_by("STUDYID", split_fun = add_combo_levels(my_combo_df, keep_levels = combo_df$valname), show_colcounts = TRUE ) |> split_rows_by("SEX", split_fun = keep_split_levels(c("F", "M"))) |> analyze(c("AGE", "BMRKR1"), afun = tpose_afun, show_labels = "hidden") build_table(lyt_tpose_full, ex_adsl)
For some table shells, we need to combine the types of needs we
explored above; we might need trim_levels_to_map type behavior, but
also need to include a virtual combination treatment/arm. The split
functions/function factories we discussed here generally cannot achieve
this, though our reasoning for how to think about the faceting we
need still applies. In such cases, we will construct fully custom
split functions which exactly meet our needs, which will be the topic
of an entire chapter in the advanced portion of this guide.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.