knitr::opts_chunk$set(comment = "#")
```{css, echo=FALSE} .reveal .r code { white-space: pre; }
## Introduction The `rtables` R package provides a framework to create, tabulate and output tables in `R`. Most of the design requirements for `rtables` have their origin in studying tables that are commonly used to report analyses from clinical trials; however, we were careful to keep `rtables` a general purpose toolkit. There are a number of other table frameworks available in `R` such as [gt](https://gt.rstudio.com/) from `RStudio`, [xtable](https://CRAN.R-project.org/package=xtable), [tableone](https://CRAN.R-project.org/package=tableone), and [tables](https://CRAN.R-project.org/package=tables) to name a few. There is a number of reasons to implement `rtables` (yet another tables R package): * output tables in ASCII to text files * table rendering (ASCII, HTML, etc.) is separate from the data model. Hence, one always has access to the non-rounded/non-formatted numbers. * pagination in both horizontal and vertical directions to meet the health authority submission requirements * cell, row, column, table reference system * titles, footers, and referential footnotes * path based access to cell content which will be useful for automated content generation In the remainder of this vignette, we give a short introduction into `rtables` and tabulating a table. The content is based on the [useR 2020 presentation from Gabriel Becker](https://www.youtube.com/watch?v=CBQzZ8ZhXLA). The packages used for this vignette are `rtables` and `dplyr`: ```r library(rtables) library(dplyr)
The data used in this vignette is a made up using random number generators. The data content is relatively simple: one row per imaginary person and one column per measurement: study arm, the country of origin, gender, handedness, age, and weight.
n <- 400 set.seed(1) df <- tibble( arm = factor(sample(c("Arm A", "Arm B"), n, replace = TRUE), levels = c("Arm A", "Arm B")), country = factor(sample(c("CAN", "USA"), n, replace = TRUE, prob = c(.55, .45)), levels = c("CAN", "USA")), gender = factor(sample(c("Female", "Male"), n, replace = TRUE), levels = c("Female", "Male")), handed = factor(sample(c("Left", "Right"), n, prob = c(.6, .4), replace = TRUE), levels = c("Left", "Right")), age = rchisq(n, 30) + 10 ) %>% mutate( weight = 35 * rnorm(n, sd = .5) + ifelse(gender == "Female", 140, 180) ) head(df)
Note that we use factor variables so that the level order is
represented in the row or column order when we tabulate the
information of df
below.
The aim of this vignette is to build the following table step by step:
lyt <- basic_table(show_colcounts = TRUE) %>% split_cols_by("arm") %>% split_cols_by("gender") %>% split_rows_by("country") %>% summarize_row_groups() %>% split_rows_by("handed") %>% summarize_row_groups() %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
In rtables
a basic table is defined to have 0 rows and one column
representing all data. Analyzing a variable is one way of adding a
row:
lyt <- basic_table() %>% analyze("age", mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
In the code above we first described the table and assigned that
description to a variable lyt
. We then built the table using the
actual data with build_table()
. The description of a table is called
a table layout. basic_table()
is the start of every table layout and
contains the information that we have in one column representing all
data. The analyze()
instruction adds to the layout that the age
variable should be analyzed with the mean()
analysis function and
the result should be rounded to 1
decimal place.
Hence, a layout is "pre-data", that is, it's a description of how to build a table once we get data. We can look at the layout isolated:
lyt
The general layouting instructions are summarized below:
basic_table()
is a layout representing a table with zero rows and
one columnsplit_rows_by()
, split_rows_by_multivar()
,
split_rows_by_cuts()
, split_rows_by_cutfun()
,
split_rows_by_quartiles()
split_cols_by()
, split_cols_by_multivar()
,
split_cols_by_cuts()
, split_cols_by_cutfun()
,
split_cols_by_quartiles()
summarize_row_groups()
analyze()
, analyze_colvars()
Using those functions, it is possible to create a wide variety of tables as we will show in this document.
We will now add more structure to the columns by adding a column split
based on the factor variable arm
:
lyt <- basic_table() %>% split_cols_by("arm") %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
The resulting table has one column per factor level of arm
. So the
data represented by the first column is df[df$arm == "ARM A",
]
. Hence, the split_cols_by()
partitions the data among the columns
by default.
Column splitting can be done in a recursive/nested manner by adding
sequential split_cols_by()
layout instruction. It's also possible to
add a non-nested split. Here we splitting each arm further by the
gender:
lyt <- basic_table() %>% split_cols_by("arm") %>% split_cols_by("gender") %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
The first column represents the data in df
where df$arm == "A" &
df$gender == "Female"
and the second column the data in df
where
df$arm == "A" & df$gender == "Male"
, and so on.
So far, we have created layouts with analysis and column splitting
instructions, i.e. analyze()
and split_cols_by()
,
respectively. This resulted with a table with multiple columns and one
data row. We will add more row structure by stratifying the mean
analysis by country (i.e. adding a split in the row space):
lyt <- basic_table() %>% split_cols_by("arm") %>% split_cols_by("gender") %>% split_rows_by("country") %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
In this table the data used to derive the first data cell (average of
age of female Canadians in Arm A) is where df$country == "CAN" &
df$arm == "Arm A" & df$gender == "Female"
. This cell value can also
be calculated manually:
mean(df$age[df$country == "CAN" & df$arm == "Arm A" & df$gender == "Female"])
Row structure can also be used to group the table into titled groups
of pages during rendering. We do this via 'page by splits', which are
declared via page_by = TRUE
within a call to split_rows_by
:
lyt <- basic_table() %>% split_cols_by("arm") %>% split_cols_by("gender") %>% split_rows_by("country", page_by = TRUE) %>% split_rows_by("handed") %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) cat(export_as_txt(tbl, page_type = "letter", page_break = "\n\n~~~~~~ Page Break ~~~~~~\n\n"))
We go into more detail on page-by splits and how to control the page-group specific titles in the Title and footer vignette.
Note that if you print or render a table without pagination, the page_by splits are currently rendered as normal row splits. This may change in future releases.
When adding row splits, we get by default label rows for each split
level, for example CAN
and USA
in the table above. Besides the
column space subsetting, we have now further subsetted the data for
each cell. It is often useful when defining a row splitting to display
information about each row group. In rtables
this is referred to as
content information, i.e. mean()
on row 2 is a descendant of CAN
(visible via the indenting, though the table has an underlying tree
structure that is not of importance for this vignette). In order to
add content information and turn the CAN
label row into a content
row, the summarize_row_groups()
function is required. By default,
the count (nrows()
) and percentage of data relative to the column
associated data is calculated:
lyt <- basic_table() %>% split_cols_by("arm") %>% split_cols_by("gender") %>% split_rows_by("country") %>% summarize_row_groups() %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
The relative percentage for average age of female Canadians is calculated as follows:
df_cell <- subset(df, df$country == "CAN" & df$arm == "Arm A" & df$gender == "Female") df_col_1 <- subset(df, df$arm == "Arm A" & df$gender == "Female") c(count = nrow(df_cell), percentage = nrow(df_cell) / nrow(df_col_1))
so the group percentages per row split sum up to 1 for each column.
We can further split the row space by dividing each country by handedness:
lyt <- basic_table() %>% split_cols_by("arm") %>% split_cols_by("gender") %>% split_rows_by("country") %>% summarize_row_groups() %>% split_rows_by("handed") %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
Next, we further add a count and percentage summary for handedness within each country:
lyt <- basic_table() %>% split_cols_by("arm") %>% split_cols_by("gender") %>% split_rows_by("country") %>% summarize_row_groups() %>% split_rows_by("handed") %>% summarize_row_groups() %>% analyze("age", afun = mean, format = "xx.x") tbl <- build_table(lyt, df) tbl
rtables
Table ObjectsOnce we have created a table, we can inspect its structure using a number of functions.
The table_structure()
function prints a summary of a table's row
structure at one of two levels of detail. By default, it summarizes
the structure at the subtable level.
table_structure(tbl)
When the detail
argument is set to "row"
, however, it provides a
more detailed row-level summary, which acts as a useful alternative to
how we might normally use the str()
function to interrogate compound
nested lists.
table_structure(tbl, detail = "row")
The make_row_df()
and make_col_df()
functions create a data.frame
which has a variety of information about the table's structure. Most
useful for introspection purposes are the label
, name
,
abs_rownumber
, path
and node_class
columns (the remainder of
information in the returned data.frame is used for pagination)
make_row_df(tbl)[,c("label", "name", "abs_rownumber", "path", "node_class")]
By default make_row_df()
summarizes only visible rows, but setting
visible_only
to FALSE
gives us a structural summary of the table,
including the full hierarchy of subtables, including those that aren't
represented directly by any visible rows:
make_row_df(tbl, visible_only = FALSE)[,c("label", "name", "abs_rownumber", "path", "node_class")]
make_col_df()
similarly accepts visible_only
, though here the
meaning is slightly different, indicating whether only leaf columns
should be summarized (TRUE
, the default) or whether higher level
groups of columns, analogous to subtables in row space, should be
summarized as well.
make_col_df(tbl)
make_col_df(tbl, visible_only = FALSE)
The row_paths_summary()
and col_paths_summary()
functions wrap the
respective make_*_df
functions, printing the name
, node_class
and path
information (in the row case), or the label
and path
information (in the column case), indented to illustrate table
structure:
row_paths_summary(tbl)
col_paths_summary(tbl)
In this vignette you have learned:
The other vignettes in the rtables
package will provide more
detailed information about the rtables
package. We recommend that
you continue with the
tabulation_dplyr
vignette which compares the information derived by the table in this
vignette using dplyr
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.