View source: R/tbl_custom_summary.R
tbl_custom_summary | R Documentation |
experimental
The tbl_custom_summary()
function calculates descriptive statistics for
continuous, categorical, and dichotomous variables.
This function is similar to tbl_summary()
but allows you to provide
a custom function in charge of computing the statistics (see Details).
tbl_custom_summary(
data,
by = NULL,
label = NULL,
stat_fns,
statistic,
digits = NULL,
type = NULL,
value = NULL,
missing = NULL,
missing_text = NULL,
include = everything(),
overall_row = FALSE,
overall_row_last = FALSE,
overall_row_label = NULL
)
data |
A data frame |
by |
A column name (quoted or unquoted) in |
label |
List of formulas specifying variables labels,
e.g. |
stat_fns |
Formula or list of formulas specifying the function to be
used to compute the statistics (see below for details and examples). You can
also use dedicated helpers such as |
statistic |
List of formulas specifying the |
digits |
List of formulas specifying the number of decimal
places to round summary statistics. If not specified,
|
type |
List of formulas specifying variable types. Accepted values
are |
value |
List of formulas specifying the value to display for dichotomous
variables. gtsummary selectors, e.g. |
missing |
Indicates whether to include counts of |
missing_text |
String to display for count of missing observations.
Default is |
include |
variables to include in the summary table. Default is |
overall_row |
Logical indicator to display an overall row. Default is
|
overall_row_last |
Logical indicator to display overall row last in
table. Default is |
overall_row_label |
String indicating the overall row label. Default is
|
A tbl_custom_summary
and tbl_summary
object
tbl_summary()
Please refer to the help file of tbl_summary()
regarding the use of select
helpers, and arguments include
, by
, type
, value
, digits
, missing
and
missing_text
.
stat_fns
argumentThe stat_fns
argument specify the custom function(s) to be used for computing
the summary statistics. For example, stat_fns = everything() ~ foo
.
Each function may take the following arguments:
foo(data, full_data, variable, by, type, ...)
data=
is the input data frame passed to tbl_custom_summary()
, subset
according to the level of by
or variable
if any, excluding NA
values of the current variable
full_data=
is the full input data frame passed to tbl_custom_summary()
variable=
is a string indicating the variable to perform the
calculation on
by=
is a string indicating the by variable from tbl_custom_summary=
,
if present
type=
is a string indicating the type of variable
(continuous, categorical, ...)
stat_display=
a string indicating the statistic to display (for the
statistic
argument, for that variable)
The user-defined does not need to utilize each of these inputs. It's
encouraged the user-defined function accept ...
as each of the arguments
will be passed to the function, even if not all inputs are utilized by
the user's function, e.g. foo(data, ...)
(see examples).
The user-defined function should return a one row dplyr::tibble()
with
one column per summary statistics (see examples).
The statistic argument specifies the statistics presented in the table. The
input is a list of formulas that specify the statistics to report. For example,
statistic = list(age ~ "{mean} ({sd})")
.
A statistic name that appears between curly brackets
will be replaced with the numeric statistic (see glue::glue()
).
All the statistics indicated in the statistic argument should be returned
by the functions defined in the stat_fns
argument.
When the summary type is "continuous2"
, pass a vector of statistics. Each element
of the vector will result in a separate row in the summary table.
For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are also available to display.
{N_obs}
total number of observations
{N_miss}
number of missing observations
{N_nonmiss}
number of non-missing observations
{p_miss}
percentage of observations missing
{p_nonmiss}
percentage of observations not missing
Note that for categorical variables, {N_obs}
, {N_miss}
and {N_nonmiss}
refer
to the total number, number missing and number non missing observations
in the denominator, not at each level of the categorical variable.
It is recommended to use modify_footnote()
to properly describe the
displayed statistics (see examples).
The returned table is compatible with all gtsummary
features applicable
to a tbl_summary
object, like add_overall()
, modify_footnote()
or
bold_labels()
.
However, some of them could be inappropriate in such case. In particular,
add_p()
do not take into account the type of displayed statistics and
always return the p-value of a comparison test of the current variable
according to the by
groups, which may be incorrect if the displayed
statistics refer to a third variable.
Example 1
Example 2
Example 3
Joseph Larmarange
Review list, formula, and selector syntax used throughout gtsummary
Other tbl_summary tools:
add_n.tbl_summary()
,
add_overall()
,
add_p.tbl_summary()
,
add_q()
,
add_stat_label()
,
bold_italicize_labels_levels
,
inline_text.tbl_summary()
,
inline_text.tbl_survfit()
,
modify
,
separate_p_footnotes()
,
tbl_merge()
,
tbl_split()
,
tbl_stack()
,
tbl_strata()
,
tbl_summary()
Other tbl_custom_summary tools:
add_overall()
,
continuous_summary()
,
proportion_summary()
,
ratio_summary()
# Example 1 ----------------------------------
my_stats <- function(data, ...) {
marker_sum <- sum(data$marker, na.rm = TRUE)
mean_age <- mean(data$age, na.rm = TRUE)
dplyr::tibble(
marker_sum = marker_sum,
mean_age = mean_age
)
}
my_stats(trial)
tbl_custom_summary_ex1 <-
trial %>%
tbl_custom_summary(
include = c("stage", "grade"),
by = "trt",
stat_fns = everything() ~ my_stats,
statistic = everything() ~ "A: {mean_age} - S: {marker_sum}",
digits = everything() ~ c(1, 0),
overall_row = TRUE,
overall_row_label = "All stages & grades"
) %>%
add_overall(last = TRUE) %>%
modify_footnote(
update = all_stat_cols() ~ "A: mean age - S: sum of marker"
) %>%
bold_labels()
# Example 2 ----------------------------------
# Use `data[[variable]]` to access the current variable
mean_ci <- function(data, variable, ...) {
test <- t.test(data[[variable]])
dplyr::tibble(
mean = test$estimate,
conf.low = test$conf.int[1],
conf.high = test$conf.int[2]
)
}
tbl_custom_summary_ex2 <-
trial %>%
tbl_custom_summary(
include = c("marker", "ttdeath"),
by = "trt",
stat_fns = ~mean_ci,
statistic = ~"{mean} [{conf.low}; {conf.high}]"
) %>%
add_overall(last = TRUE) %>%
modify_footnote(
update = all_stat_cols() ~ "mean [95% CI]"
)
# Example 3 ----------------------------------
# Use `full_data` to access the full datasets
# Returned statistic can also be a character
diff_to_great_mean <- function(data, full_data, ...) {
mean <- mean(data$marker, na.rm = TRUE)
great_mean <- mean(full_data$marker, na.rm = TRUE)
diff <- mean - great_mean
dplyr::tibble(
mean = mean,
great_mean = great_mean,
diff = diff,
level = ifelse(diff > 0, "high", "low")
)
}
tbl_custom_summary_ex3 <-
trial %>%
tbl_custom_summary(
include = c("grade", "stage"),
by = "trt",
stat_fns = ~diff_to_great_mean,
statistic = ~"{mean} ({level}, diff: {diff})",
overall_row = TRUE
) %>%
bold_labels()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.