knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, dpi = 150 ) # Use ragg for better font rendering if available if (requireNamespace("ragg", quietly = TRUE)) { knitr::opts_chunk$set(dev = "ragg_png") } old_opts <- options(width = 180)
Descriptive statistics provide the foundation for any quantitative analysis. Before estimating relationships between variables, it is essential to first characterize the distribution of each variable and assess balance across comparison groups. A well-constructed descriptive table—often designated "Table 1" in published research—accomplishes three objectives: it summarizes the central tendency and dispersion of continuous variables, tabulates the frequency distribution of categorical variables, and tests for systematic differences between groups.
The desctable() function generates publication-ready descriptive tables with automatic detection of variable types, appropriate summary statistics, and optional hypothesis testing. It adheres to the to the standard summata calling convention:
desctable(data, by, variables, ...)
where data is the dataset, by specifies the grouping variable (optional), and variables lists the variables to summarize. This vignette demonstrates the function’s capabilities using the included sample dataset.
The examples in this vignette use the clintrial dataset included with summata:
library(summata) data(clintrial) data(clintrial_labels)
The clintrial dataset contains r nrow(clintrial) observations with continuous, categorical, and time-to-event variables suitable for demonstrating descriptive statistics. The clintrial_labels vector provides human-readable labels for display.
The desctable() function automatically selects appropriate summary statistics and hypothesis tests based on variable type:
| Variable Type | Summary Statistic | Two Groups | Three+ Groups | |:--------------|:------------------|:-----------|:--------------| | Continuous (parametric) | Mean ± SD | t-test | ANOVA | | Continuous (nonparametric) | Median [IQR] | Wilcoxon | Kruskal–Wallis | | Categorical | n (%) | χ² or Fisher | χ² or Fisher | | Time-to-event | Median (95% CI) | Log-rank | Log-rank |
For categorical variables, Fisher exact test is used when any expected cell count falls below 5. For continuous variables, the test selection follows the displayed statistic: parametric tests are used with mean-based statistics, nonparametric tests with median-based statistics.
The most common use-case for descriptive tables is comparing characteristics across groups:
desc_vars <- c("age", "sex", "race", "bmi", "stage", "ecog", "Surv(os_months, os_status)") example1 <- desctable( data = clintrial, by = "treatment", variables = desc_vars, labels = clintrial_labels ) example1
The output includes a "Total" column by default, showing overall statistics alongside group-specific values.
Omitting the by argument produces overall summary statistics without group comparisons:
example2 <- desctable( data = clintrial, variables = c("age", "bmi", "sex", "stage"), labels = clintrial_labels ) example2
The default summary statistics can be customized for both continuous and categorical variables.
The stats_continuous parameter controls how continuous variables are summarized:
| Value | Output Format |
|:------|:--------------|
| "mean_sd" | Mean ± SD |
| "median_iqr" | Median [Q1–Q3] (default) |
| "median_range" | Median (min–max) |
example3 <- desctable( data = clintrial, by = "treatment", variables = c("age", "bmi", "los_days"), stats_continuous = c("mean_sd", "median_iqr", "median_range"), labels = clintrial_labels ) example3
The stats_categorical parameter controls categorical variable display:
| Value | Output Format |
|:------|:--------------|
| "n_percent" | n (%) (default) |
| "n" | n only |
| "percent" | % only |
example4 <- desctable( data = clintrial, by = "treatment", variables = c("sex", "stage", "ecog"), stats_categorical = "percent", labels = clintrial_labels ) example4
Control decimal places with digits (for statistics) and p_digits (for p-values):
example5 <- desctable( data = clintrial, by = "treatment", variables = c("age", "bmi", "sex"), digits = 2, p_digits = 4, test = TRUE, labels = clintrial_labels ) example5
When comparing groups, hypothesis tests assess whether observed differences are statistically significant.
By default, automatic hypothesis testing based on the summary statistic is displayed (test = TRUE). Setting test = FALSE disables this functionality:
example6 <- desctable( data = clintrial, by = "treatment", variables = c("age", "bmi", "sex", "stage"), test = FALSE, labels = clintrial_labels ) example6
Override automatic selection with test_continuous and test_categorical. Available test specifications include:
Continuous (test_continuous):
"auto": Automatic selection (default)"t": Student t-test"wrs": Wilcoxon rank-sum test"aov": One-way ANOVA"kwt": Kruskal–Wallis testCategorical (test_categorical):
"auto": Automatic selection (default)"chisq": Pearson χ² test"fisher": Fisher exact testThe following example forces parametric tests for continuous variables:
example7a <- desctable( data = clintrial, by = "treatment", variables = c("age", "bmi", "los_days"), test = TRUE, test_continuous = "aov", # ANOVA labels = clintrial_labels ) example7a
This example forces the Fisher exact test for categorical variables:
example7b <- desctable( data = clintrial, by = "treatment", variables = c("sex", "stage"), test = TRUE, test_categorical = "fisher", labels = clintrial_labels ) example7b
Missing values require special consideration in descriptive tables. Options control whether missing values are displayed and how percentages are calculated.
By default, missing values are excluded from calculations. Set na_include = TRUE to display them as a separate category:
example8 <- desctable( data = clintrial, by = "treatment", variables = c("smoking", "diabetes"), na_include = TRUE, labels = clintrial_labels ) example8
The na_percent parameter controls whether missing values are included in percentage calculations:
# Percentages exclude missing (denominator = non-missing) example9a <- desctable( data = clintrial, by = "treatment", variables = c("smoking"), na_include = TRUE, na_percent = FALSE, labels = clintrial_labels ) example9a # Percentages include missing (denominator = total) example9b <- desctable( data = clintrial, by = "treatment", variables = c("smoking"), na_include = TRUE, na_percent = TRUE, labels = clintrial_labels ) example9b
The label for missing values can be customized using the na_label parameter:
example10 <- desctable( data = clintrial, by = "treatment", variables = c("smoking"), na_include = TRUE, na_label = "Not Reported", labels = clintrial_labels ) example10
The total column provides overall statistics alongside group-specific values.
The total parameter controls the presence and position of the total column:
| Value | Effect |
|:------|:-------|
| TRUE, "first" | Total column first (default) |
| "last" | Total column last |
| FALSE | No total column |
# Total column in last position example11a <- desctable( data = clintrial, by = "treatment", variables = c("age", "sex", "stage"), total = "last", labels = clintrial_labels ) example11a # No total column example11b <- desctable( data = clintrial, by = "treatment", variables = c("age", "sex", "stage"), total = FALSE, labels = clintrial_labels ) example11b
The following demonstrates a comprehensive descriptive table suitable for publication:
table1 <- desctable( data = clintrial, by = "treatment", variables = c( "age", "sex", "race", "ethnicity", "bmi", "smoking", "diabetes", "hypertension", "stage", "grade", "ecog", "Surv(os_months, os_status)" ), labels = clintrial_labels, stats_continuous = "mean_sd", stats_categorical = "n_percent", test = TRUE, total = TRUE, digits = 1, p_digits = 3 ) table1
The underlying numeric values are stored as an attribute for programmatic access:
raw_data <- attr(table1, "raw_data") head(raw_data)
For detailed survival analysis—including landmark survival estimates, survival quantiles, and multiple endpoints—see the dedicated Survival Tables vignette. The survtable() function provides comprehensive options for reporting time-to-event outcomes.
Descriptive tables can be exported to various formats. See the Table Export vignette for comprehensive documentation.
# Microsoft Word table2docx( table = table1, file = file.path(tempdir(), "Table1.docx"), caption = "Table 1. Baseline Characteristics by Group" ) # PDF (requires LaTeX) table2pdf( table = table1, file = file.path(tempdir(), "Table1.pdf"), caption = "Table 1. Baseline Characteristics by Group" ) # HTML table2html( table = table1, file = file.path(tempdir(), "Table1.html"), caption = "Table 1. Baseline Characteristics by Group" )
When a factor level has zero observations in a group, ensure all levels are explicitly defined:
data$stage <- factor(data$stage, levels = c("I", "II", "III", "IV"))
For highly skewed distributions, use median and IQR:
desctable(data, by, variables, stats_continuous = "median_iqr")
For tables with many variables, consider splitting by category or using landscape orientation for export:
table2pdf(table, file.path(tempdir(), "table1.pdf"), orientation = "landscape", font_size = 8)
options(old_opts)
survtable() for time-to-event summariesfit(), uniscreen(), and fullfit()compfit() for comparing modelsmultifit() for multi-outcome analysisAny scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.