knitr::opts_chunk$set(comment = "#")
```{css, echo=FALSE} .reveal .r code { white-space: pre; }
## Formats Precedence Users of the `rtables` package can specify the format in which the numbers in the reporting tables are printed. Formatting functionality is provided by the [`formatters`](https://insightsengineering.github.io/formatters/) R package. See `formatters::list_valid_format_labels()` for a list of all available formats. The format can be specified by the user in a few different places. It may happen that, for a single table layout, the format is specified in more than one place. In such a case, the final format that will be applied depends on format precedence rules defined by `rtables`. In this vignette, we describe the basic rules of `rtables` format precedence. The examples shown in this vignette utilize the example `ADSL` dataset, a demographic table that summarizes the variables content for different population subsets (encoded in the columns). ```r library(rtables) ADSL <- ex_adsl
Note that all ex_*
data which is currently attached to the rtables
package is provided by the
formatters
package and was created using the publicly available
random.cdisc.data
R package.
The format in which numbers are printed can be specified by the user
in a few different places. In the context of precedence, it is
important which level of the split hierarchy formats are specified
at. In general, there are two such levels: the cell level
and the so-called parent table level. The concept of the cell and
the parent table results from the way in which the rtables
package
stores resulting tables. It models the resulting tables as
hierarchical, tree-like objects with the cells (as leaves) containing
multiple values. Particularly noteworthy in this context is the fact
that the actual table splitting occurs in a row-dominant way (even if
column splitting is present in the layout). rtables
provides
user-end function table_structure()
that prints the structure of a
given table object.
For a simple illustration, consider the following example:
lyt <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze(vars = "AGE", afun = mean) adsl_analyzed <- build_table(lyt, ADSL) adsl_analyzed table_structure(adsl_analyzed)
In this table, there are 4 sub-tables under the SEX
table.
These are: F
, M
, U
, and UNDIFFERENTIATED
. Each of these
sub-tables has one sub-table AGE
. For example, for the first
AGE
sub-table, its parent table is F
.
The concept of hierarchical, tree-like representations of resulting tables translates directly to format precedence and inheritance rules. As a general principle, the format being finally applied for the cell is the one that is the most specific, that is, the one which is the closest to the cell in a given path in the tree. Hence, the precedence-inheritance chain looks like the following:
parent_table -> parent_table -> ... -> parent_table -> cell
In such a chain, the outermost parent_table
is the least specific
place to specify the format, while the cell
is the most specific
one. In cases where the format is specified by the user in more than
one place, the one which is most specific will be applied in the cell.
If no specific format has been selected by the user for the split, then
the default format will be applied. The default format is "xx"
and it
yields the same formatting as the as.character()
function. In the
following sections of this vignette, we will illustrate the format
precedence rules with a few examples.
Below is a simple layout that does not explicitly set a format for the output of the analysis function. In such a case, the default format is applied.
lyt0 <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = "AGE", afun = mean) build_table(lyt0, ADSL)
The format of a cell can be explicitly specified via the rcell()
or in_rows()
functions. The former is essentially a collection of
data objects while the latter is a collection of rcell()
objects. As
previously mentioned, this is the most specific place where the format
can be specified by the user.
lyt1 <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = "AGE", afun = function(x) { rcell(mean(x), format = "xx.xx", label = "Mean") }) build_table(lyt1, ADSL) lyt1a <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x)), .formats = "xx.xx" ) }) build_table(lyt1a, ADSL)
If the format is specified in both of these places at the same time,
the one specified via in_rows()
takes highest precedence.
Technically, in this case, the format defined in rcell()
will simply
be overwritten by the one defined in in_rows()
. This is
because the format specified in in_rows()
is applied to the cells
not the rows (overriding the previously specified cell-specific
values), which indicates that the precedence rules described above are
still in place.
lyt2 <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x), format = "xx.xxx"), .formats = "xx.xx" ) }) build_table(lyt2, ADSL)
In addition to the cell level, the format can be specified at the parent table level. If no format has been set by the user for a cell, the most specific format for that cell is the one defined at its innermost parent table split (if any).
lyt3 <- basic_table() %>% split_cols_by("ARM") %>% analyze(vars = "AGE", mean, format = "xx.x") build_table(lyt3, ADSL)
If the cell format is also specified for a cell, then the parent table format is ignored for this cell since the cell format is more specific and therefore takes precedence.
lyt4 <- basic_table() %>% split_cols_by("ARM") %>% analyze( vars = "AGE", afun = function(x) { rcell(mean(x), format = "xx.xx", label = "Mean") }, format = "xx.x" ) build_table(lyt4, ADSL) lyt4a <- basic_table() %>% split_cols_by("ARM") %>% analyze( vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x)), "SD" = rcell(sd(x)), .formats = "xx.xx" ) }, format = "xx.x" ) build_table(lyt4a, ADSL)
In the following, slightly more complicated, example, we can observe
partial inheritance. That is, only SD
cells inherit the parent
table's format while the Mean
cells do not.
lyt5 <- basic_table() %>% split_cols_by("ARM") %>% analyze( vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x), format = "xx.xx"), "SD" = rcell(sd(x)) ) }, format = "xx.x" ) build_table(lyt5, ADSL)
NA
HandlingConsider the following layout and the resulting table created:
lyt6 <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze(vars = "AGE", afun = mean, format = "xx.xx") build_table(lyt6, ADSL)
In the output the cell corresponding to the UNDIFFERENTIATED
level
of SEX
and the B: Placebo
level of ARM
is displayed as NA
.
This occurs because there were no non-NA
values under this facet
that could be used to compute the mean. rtables
allows the user to
specify a string to display when cell values are NA
. Similar to
formats for numbers, the user can specify a string to replace NA
with the parameter format_na_str
or .format_na_str
. This can be
specified at the cell or parent table level. NA
string
precedence and inheritance rules are the same as those for number
format precedence, described in the previous section of this
vignette. We will illustrate this with a few examples.
NA
Values at the Cell LevelAt the cell level, it is possible to replace NA
values with a custom
string by means of the format_na_str
parameter in rcell()
or
.format_na_str
parameter in in_rows()
.
lyt7 <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze(vars = "AGE", afun = function(x) { rcell(mean(x), format = "xx.xx", label = "Mean", format_na_str = "<missing>") }) build_table(lyt7, ADSL) lyt7a <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze(vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x), format = "xx.xx"), .format_na_strs = "<MISSING>" ) }) build_table(lyt7a, ADSL)
If the NA
string is specified in both of these places at the same
time, the one specified with in_rows()
takes precedence.
Technically, in this case the NA
replacement string
defined in rcell()
will simply be overwritten by the one defined in
in_rows()
. This is because the NA
string specified in in_rows()
is applied to the cells, not the rows (overriding the previously
specified cell specific values), which means that the precedence rules
described above are still in place.
lyt8 <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze(vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x), format = "xx.xx", format_na_str = "<missing>"), .format_na_strs = "<MISSING>" ) }) build_table(lyt8, ADSL)
NA
Values and Inheritance PrinciplesIn addition to the cell level, the string replacement for NA
values
can be specified at the parent table level. If no replacement
string has been specified by the user for a cell, the most specific
NA
string for that cell is the one defined at its innermost parent
table split (if any).
lyt9 <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze(vars = "AGE", mean, format = "xx.xx", na_str = "not available") build_table(lyt9, ADSL)
If an NA
value replacement string was also specified at the
cell level, then the one set at the parent table level is ignored for
this cell as the cell level format is more specific and therefore
takes precedence.
lyt10 <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze( vars = "AGE", afun = function(x) { rcell(mean(x), format = "xx.xx", label = "Mean", format_na_str = "<missing>") }, na_str = "not available" ) build_table(lyt10, ADSL) lyt10a <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze( vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x)), "SD" = rcell(sd(x)), .formats = "xx.xx", .format_na_strs = "<missing>" ) }, na_str = "not available" ) build_table(lyt10a, ADSL)
In the following, slightly more complicated example, we can observe
partial inheritance of NA strings. That is, only SD
cells inherit
the parent table's NA
string, while the Mean
cells do not.
lyt11 <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by("SEX") %>% analyze( vars = "AGE", afun = function(x) { in_rows( "Mean" = rcell(mean(x), format_na_str = "<missing>"), "SD" = rcell(sd(x)) ) }, format = "xx.xx", na_str = "not available" ) build_table(lyt11, ADSL)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.