| desctable | R Documentation |
Generates comprehensive descriptive statistics tables with automatic variable type detection, group comparisons, and appropriate statistical testing. This function is designed to create "Table 1"-style summaries commonly used in clinical and epidemiological research, with full support for continuous, categorical, and time-to-event variables.
desctable(
data,
by = NULL,
variables,
stats_continuous = c("median_iqr"),
stats_categorical = "n_percent",
digits = 1,
p_digits = 3,
conf_level = 0.95,
p_per_stat = FALSE,
na_include = FALSE,
na_label = "Unknown",
na_percent = FALSE,
test = TRUE,
test_continuous = "auto",
test_categorical = "auto",
total = TRUE,
total_label = "Total",
labels = NULL,
number_format = NULL,
...
)
data |
Data frame or data.table containing the dataset to summarize. Automatically converted to a data.table for efficient processing. |
by |
Character string specifying the column name of the grouping
variable for stratified analysis (e.g., treatment arm, exposure
status). When |
variables |
Character vector of variable names to summarize. Can
include standard column names for continuous or categorical variables,
and survival expressions using |
stats_continuous |
Character vector specifying which statistics to compute for continuous variables. Multiple values create separate rows for each variable. Options:
Default is |
stats_categorical |
Character string specifying the format for categorical variable summaries:
|
digits |
Integer specifying the number of decimal places for continuous statistics. Default is 1. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals in survival variable summaries (median survival time with CI). Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
p_per_stat |
Logical. If |
na_include |
Logical. If |
na_label |
Character string used to label the missing values row when
|
na_percent |
Logical. Controls how percentages are calculated for
categorical variables when
Only affects categorical variables. Default is |
test |
Logical. If |
test_continuous |
Character string specifying the statistical test for continuous variables:
|
test_categorical |
Character string specifying the statistical test for categorical variables:
|
total |
Logical or character string controlling the total column:
|
total_label |
Character string for the total column header.
Default is |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names (or |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
... |
Additional arguments passed to the underlying statistical test
functions (e.g., |
Variable Type Detection:
The function automatically detects variable types and applies appropriate summaries:
Continuous: Numeric variables (integer or double) receive
statistics specified in stats_continuous
Categorical: Character, factor, or logical variables receive frequency counts and percentages
Time-to-Event: Variables specified as
Surv(time, event) display median survival with confidence
intervals (level controlled by conf_level)
Statistical Testing:
When test = TRUE and by is specified:
Continuous with "auto": Parametric tests (t-test, ANOVA) for mean-based statistics; non-parametric tests (Wilcoxon, Kruskal-Wallis) for median-based statistics
Categorical with "auto": Fisher exact test when any
expected cell frequency < 5; \chi^2 test otherwise
Survival: Log-rank test for comparing survival curves
Range statistics: No p-value computed (ranges are descriptive)
Missing Data Handling:
Missing values are handled differently by variable type:
Continuous: NAs excluded from calculations; optionally
shown as count when na_include = TRUE
Categorical: NAs can be included as a category when
na_include = TRUE. The na_percent parameter controls
whether percentages are calculated with or without NAs in the
denominator
Survival: NAs in time or event excluded from analysis
Formatting Conventions:
All numeric output respects the number_format parameter. Separators
within ranges and confidence intervals adapt automatically to avoid
ambiguity:
Mean \pm SD: "45.2 \eqn{\pm} 12.3" (US) or
"45,2 \eqn{\pm} 12,3" (EU)
Median [IQR]: "38.0 [28.0-52.0]" (US) or
"38,0 [28,0-52,0]" (EU, en-dash separator)
Range: "18.0-75.0" (positive, US),
"-5.0 to 10.0" (when bounds are negative)
Survival: "24.5 (21.2-28.9)" (US) or
"24,5 (21,2-28,9)" (EU)
Counts \ge 1000: "1,234" (US) or "1.234" (EU)
p-values: "< 0.001" (US) or "< 0,001" (EU)
A data.table with S3 class "desctable" containing formatted
descriptive statistics. The table structure includes:
Variable name or label (from labels)
For continuous variables: statistic type (e.g.,
"Mean \pm SD", "Median [IQR]"). For categorical variables:
category level. Empty for variable name rows.
Statistics for the total sample (if
total = TRUE)
Statistics for each group level (when by
is specified). Column names match group levels.
Formatted p-values from statistical tests
(when test = TRUE and by is specified)
The first row always shows sample sizes for each column. All numeric
output (counts, statistics, p-values) respects the
number_format setting for locale-appropriate formatting.
The returned object includes the following attributes accessible via
attr():
A data.table containing unformatted numeric values suitable for further statistical analysis or custom formatting. Includes additional columns for standard deviations, quartiles, etc.
The grouping variable name used (value of
by)
The variables analyzed (value of
variables)
survtable for detailed survival summary tables,
fit for regression modeling,
table2pdf for PDF export,
table2docx for Word export,
table2html for HTML export
Other descriptive functions:
print.survtable(),
survtable()
# Load example clinical trial data
data(clintrial)
# Example 1: Basic descriptive table without grouping
desctable(clintrial,
variables = c("age", "sex", "bmi"))
# Example 2: Grouped comparison with default tests
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "race", "bmi"))
# Example 3: Customize continuous statistics
desctable(clintrial,
by = "treatment",
variables = c("age", "bmi", "creatinine"),
stats_continuous = c("median_iqr", "range"))
# Example 4: Change categorical display format
desctable(clintrial,
by = "treatment",
variables = c("sex", "race", "smoking"),
stats_categorical = "n") # Show counts only
# Example 5: Include missing values
desctable(clintrial,
by = "treatment",
variables = c("age", "smoking", "hypertension"),
na_include = TRUE,
na_label = "Missing")
# Example 6: Disable statistical testing
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
test = FALSE)
# Example 7: Force specific tests
desctable(clintrial,
by = "surgery",
variables = c("age", "sex"),
test_continuous = "t", # t-test instead of auto
test_categorical = "fisher") # Fisher test instead of auto
# Example 8: Adjust decimal places
desctable(clintrial,
by = "treatment",
variables = c("age", "bmi"),
digits = 2, # 2 decimals for continuous
p_digits = 4) # 4 decimals for p-values
# Example 9: Custom variable labels
labels <- c(
age = "Age (years)",
sex = "Sex",
bmi = "Body Mass Index (kg/m\u00b2)",
treatment = "Treatment Arm"
)
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
labels = labels)
# Example 10: Position total column last
desctable(clintrial,
by = "treatment",
variables = c("age", "sex"),
total = "last")
# Example 11: Exclude total column
desctable(clintrial,
by = "treatment",
variables = c("age", "sex"),
total = FALSE)
# Example 12: Survival analysis
desctable(clintrial,
by = "treatment",
variables = "Surv(os_months, os_status)")
# Example 13: Multiple survival endpoints
desctable(clintrial,
by = "treatment",
variables = c(
"Surv(pfs_months, pfs_status)",
"Surv(os_months, os_status)"
),
labels = c(
"Surv(pfs_months, pfs_status)" = "Progression-Free Survival",
"Surv(os_months, os_status)" = "Overall Survival"
))
# Example 14: Mixed variable types
desctable(clintrial,
by = "treatment",
variables = c(
"age", "sex", "race", # Demographics
"bmi", "creatinine", # Labs
"smoking", "hypertension", # Risk factors
"Surv(os_months, os_status)" # Survival
))
# Example 15: Three or more groups
desctable(clintrial,
by = "stage", # Assuming stage has 3+ levels
variables = c("age", "sex", "bmi"))
# Automatically uses ANOVA/Kruskal-Wallis and chi-squared
# Example 16: Access raw unformatted data
result <- desctable(clintrial,
by = "treatment",
variables = c("age", "bmi"))
raw_data <- attr(result, "raw_data")
print(raw_data)
# Raw data includes unformatted numbers, SDs, quartiles, etc.
# Example 17: Check which grouping variable was used
result <- desctable(clintrial,
by = "treatment",
variables = c("age", "sex"))
attr(result, "by_variable") # "treatment"
# Example 18: NA percentage calculation options
# Include NAs in percentage denominator (all sum to 100%)
desctable(clintrial,
by = "treatment",
variables = "smoking",
na_include = TRUE,
na_percent = TRUE)
# Exclude NAs from denominator (non-missing sum to 100%)
desctable(clintrial,
by = "treatment",
variables = "smoking",
na_include = TRUE,
na_percent = FALSE)
# Example 19: Passing additional test arguments
# Equal variance t-test
desctable(clintrial,
by = "sex",
variables = "age",
test_continuous = "t",
var.equal = TRUE)
# Example 20: European number formatting
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
number_format = "eu")
# Example 21: Complete Table 1 for publication
table1 <- desctable(
data = clintrial,
by = "treatment",
variables = c(
"age", "sex", "race", "ethnicity", "bmi",
"smoking", "hypertension", "diabetes",
"ecog", "creatinine", "hemoglobin",
"site", "stage", "grade",
"Surv(os_months, os_status)"
),
labels = clintrial_labels,
stats_continuous = c("median_iqr", "range"),
total = TRUE,
na_include = FALSE
)
print(table1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.