analyze_variables | R Documentation |
The analyze function analyze_vars()
creates a layout element to summarize one or more variables, using the S3
generic function s_summary()
to calculate a list of summary statistics. A list of all available statistics for
numeric variables can be viewed by running get_stats("analyze_vars_numeric")
and for non-numeric variables by
running get_stats("analyze_vars_counts")
. Use the .stats
parameter to specify the statistics to include in your
output summary table.
analyze_vars(
lyt,
vars,
var_labels = vars,
na_str = default_na_str(),
nested = TRUE,
...,
na.rm = TRUE,
show_labels = "default",
table_names = vars,
section_div = NA_character_,
.stats = c("n", "mean_sd", "median", "range", "count_fraction"),
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)
s_summary(x, na.rm = TRUE, denom, .N_row, .N_col, .var, ...)
## S3 method for class 'numeric'
s_summary(
x,
na.rm = TRUE,
denom,
.N_row,
.N_col,
.var,
control = control_analyze_vars(),
...
)
## S3 method for class 'factor'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
## S3 method for class 'character'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
.var,
verbose = TRUE,
...
)
## S3 method for class 'logical'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
a_summary(
x,
.N_col,
.N_row,
.var = NULL,
.df_row = NULL,
.ref_group = NULL,
.in_ref_col = FALSE,
compare = FALSE,
.stats = NULL,
.formats = NULL,
.labels = NULL,
.indent_mods = NULL,
na.rm = TRUE,
na_str = default_na_str(),
...
)
lyt |
( |
vars |
( |
var_labels |
( |
na_str |
( |
nested |
( |
... |
arguments passed to |
na.rm |
( |
show_labels |
( |
table_names |
( |
section_div |
( |
.stats |
( |
.formats |
(named |
.labels |
(named |
.indent_mods |
(named |
x |
( |
denom |
(
|
.N_row |
( |
.N_col |
( |
.var |
( |
control |
(
|
verbose |
( |
.df_row |
( |
.ref_group |
( |
.in_ref_col |
( |
compare |
( |
Automatic digit formatting: The number of digits to display can be automatically determined from the analyzed
variable(s) (vars
) for certain statistics by setting the statistic format to "auto"
in .formats
.
This utilizes the format_auto()
formatting function. Note that only data for the current row & variable (for all
columns) will be considered (.df_row[[.var]]
, see rtables::additional_fun_params
) and not the whole dataset.
analyze_vars()
returns a layout object suitable for passing to further layouting functions,
or to rtables::build_table()
. Adding this function to an rtable
layout will add formatted rows containing
the statistics from s_summary()
to the table layout.
s_summary()
returns different statistics depending on the class of x
.
If x
is of class numeric
, returns a list
with the following named numeric
items:
n
: The length()
of x
.
sum
: The sum()
of x
.
mean
: The mean()
of x
.
sd
: The stats::sd()
of x
.
se
: The standard error of x
mean, i.e.: (sd(x) / sqrt(length(x))
).
mean_sd
: The mean()
and stats::sd()
of x
.
mean_se
: The mean()
of x
and its standard error (see above).
mean_ci
: The CI for the mean of x
(from stat_mean_ci()
).
mean_sei
: The SE interval for the mean of x
, i.e.: (mean()
-/+ stats::sd()
/ sqrt()
).
mean_sdi
: The SD interval for the mean of x
, i.e.: (mean()
-/+ stats::sd()
).
mean_pval
: The two-sided p-value of the mean of x
(from stat_mean_pval()
).
median
: The stats::median()
of x
.
mad
: The median absolute deviation of x
, i.e.: (stats::median()
of xc
,
where xc
= x
- stats::median()
).
median_ci
: The CI for the median of x
(from stat_median_ci()
).
quantiles
: Two sample quantiles of x
(from stats::quantile()
).
iqr
: The stats::IQR()
of x
.
range
: The range_noinf()
of x
.
min
: The max()
of x
.
max
: The min()
of x
.
median_range
: The median()
and range_noinf()
of x
.
cv
: The coefficient of variation of x
, i.e.: (stats::sd()
/ mean()
* 100).
geom_mean
: The geometric mean of x
, i.e.: (exp(mean(log(x)))
).
geom_cv
: The geometric coefficient of variation of x
, i.e.: (sqrt(exp(sd(log(x)) ^ 2) - 1) * 100
).
If x
is of class factor
or converted from character
, returns a list
with named numeric
items:
n
: The length()
of x
.
count
: A list with the number of cases for each level of the factor x
.
count_fraction
: Similar to count
but also includes the proportion of cases for each level of the
factor x
relative to the denominator, or NA
if the denominator is zero.
If x
is of class logical
, returns a list
with named numeric
items:
n
: The length()
of x
(possibly after removing NA
s).
count
: Count of TRUE
in x
.
count_fraction
: Count and proportion of TRUE
in x
relative to the denominator, or NA
if the
denominator is zero. Note that NA
s in x
are never counted or leading to NA
here.
a_summary()
returns the corresponding list with formatted rtables::CellValue()
.
analyze_vars()
: Layout-creating function which can take statistics function arguments
and additional format arguments. This function is a wrapper for rtables::analyze()
.
s_summary()
: S3 generic function to produces a variable summary.
s_summary(numeric)
: Method for numeric
class.
s_summary(factor)
: Method for factor
class.
s_summary(character)
: Method for character
class. This makes an automatic
conversion to factor (with a warning) and then forwards to the method for factors.
s_summary(logical)
: Method for logical
class.
a_summary()
: Formatted analysis function which is used as afun
in analyze_vars()
and
compare_vars()
and as cfun
in summarize_colvars()
.
If x
is an empty vector, NA
is returned. This is the expected feature so as to return rcell
content in
rtables
when the intersection of a column and a row delimits an empty data selection.
When the mean
function is applied to an empty vector, NA
will be returned instead of NaN
, the latter
being standard behavior in R.
If x
is an empty factor
, a list is still returned for counts
with one element
per factor level. If there are no levels in x
, the function fails.
If factor variables contain NA
, these NA
values are excluded by default. To include NA
values
set na.rm = FALSE
and missing values will be displayed as an NA
level. Alternatively, an explicit
factor level can be defined for NA
values during pre-processing via df_explicit_na()
- the
default na_level
("<Missing>"
) will also be excluded when na.rm
is set to TRUE
.
Automatic conversion of character to factor does not guarantee that the table
can be generated correctly. In particular for sparse tables this very likely can fail.
It is therefore better to always pre-process the dataset such that factors are manually
created from character variables before passing the dataset to rtables::build_table()
.
To use for comparison (with additional p-value statistic), parameter compare
must be set to TRUE
.
Ensure that either all NA
values are converted to an explicit NA
level or all NA
values are left as is.
## Fabricated dataset.
dta_test <- data.frame(
USUBJID = rep(1:6, each = 3),
PARAMCD = rep("lab", 6 * 3),
AVISIT = rep(paste0("V", 1:3), 6),
ARM = rep(LETTERS[1:3], rep(6, 3)),
AVAL = c(9:1, rep(NA, 9))
)
# `analyze_vars()` in `rtables` pipelines
## Default output within a `rtables` pipeline.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(vars = "AVAL")
build_table(l, df = dta_test)
## Select and format statistics output.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(
vars = "AVAL",
.stats = c("n", "mean_sd", "quantiles"),
.formats = c("mean_sd" = "xx.x, xx.x"),
.labels = c(n = "n", mean_sd = "Mean, SD", quantiles = c("Q1 - Q3"))
)
build_table(l, df = dta_test)
## Use arguments interpreted by `s_summary`.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(vars = "AVAL", na.rm = FALSE)
build_table(l, df = dta_test)
## Handle `NA` levels first when summarizing factors.
dta_test$AVISIT <- NA_character_
dta_test <- df_explicit_na(dta_test)
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
analyze_vars(vars = "AVISIT", na.rm = FALSE)
build_table(l, df = dta_test)
# auto format
dt <- data.frame("VAR" = c(0.001, 0.2, 0.0011000, 3, 4))
basic_table() %>%
analyze_vars(
vars = "VAR",
.stats = c("n", "mean", "mean_sd", "range"),
.formats = c("mean_sd" = "auto", "range" = "auto")
) %>%
build_table(dt)
# `s_summary.numeric`
## Basic usage: empty numeric returns NA-filled items.
s_summary(numeric())
## Management of NA values.
x <- c(NA_real_, 1)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
x <- c(NA_real_, 1, 2)
s_summary(x, stats = NULL)
## Benefits in `rtables` contructions:
dta_test <- data.frame(
Group = rep(LETTERS[1:3], each = 2),
sub_group = rep(letters[1:2], each = 3),
x = 1:6
)
## The summary obtained in with `rtables`:
basic_table() %>%
split_cols_by(var = "Group") %>%
split_rows_by(var = "sub_group") %>%
analyze(vars = "x", afun = s_summary) %>%
build_table(df = dta_test)
## By comparison with `lapply`:
X <- split(dta_test, f = with(dta_test, interaction(Group, sub_group)))
lapply(X, function(x) s_summary(x$x))
# `s_summary.factor`
## Basic usage:
s_summary(factor(c("a", "a", "b", "c", "a")))
# Empty factor returns zero-filled items.
s_summary(factor(levels = c("a", "b", "c")))
## Management of NA values.
x <- factor(c(NA, "Female"))
x <- explicit_na(x)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
## Different denominators.
x <- factor(c("a", "a", "b", "c", "a"))
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)
# `s_summary.character`
## Basic usage:
s_summary(c("a", "a", "b", "c", "a"), .var = "x", verbose = FALSE)
s_summary(c("a", "a", "b", "c", "a", ""), .var = "x", na.rm = FALSE, verbose = FALSE)
# `s_summary.logical`
## Basic usage:
s_summary(c(TRUE, FALSE, TRUE, TRUE))
# Empty factor returns zero-filled items.
s_summary(as.logical(c()))
## Management of NA values.
x <- c(NA, TRUE, FALSE)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
## Different denominators.
x <- c(TRUE, FALSE, TRUE, TRUE)
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)
a_summary(factor(c("a", "a", "b", "c", "a")), .N_row = 10, .N_col = 10)
a_summary(
factor(c("a", "a", "b", "c", "a")),
.ref_group = factor(c("a", "a", "b", "c")), compare = TRUE
)
a_summary(c("A", "B", "A", "C"), .var = "x", .N_col = 10, .N_row = 10, verbose = FALSE)
a_summary(
c("A", "B", "A", "C"),
.ref_group = c("B", "A", "C"), .var = "x", compare = TRUE, verbose = FALSE
)
a_summary(c(TRUE, FALSE, FALSE, TRUE, TRUE), .N_row = 10, .N_col = 10)
a_summary(
c(TRUE, FALSE, FALSE, TRUE, TRUE),
.ref_group = c(TRUE, FALSE), .in_ref_col = TRUE, compare = TRUE
)
a_summary(rnorm(10), .N_col = 10, .N_row = 20, .var = "bla")
a_summary(rnorm(10, 5, 1), .ref_group = rnorm(20, -5, 1), .var = "bla", compare = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.