ternG: Generate grouped summary table with appropriate statistical...

View source: R/ternG.R

ternGR Documentation

Generate grouped summary table with appropriate statistical tests

Description

Creates a grouped summary table with optional statistical testing for group comparisons. Supports numeric and categorical variables; numeric variables can be treated as ordinal via force_ordinal. Includes options to calculate P values and odds ratios. For descriptive (ungrouped) tables, use ternD.

Usage

ternG(
  data,
  vars = NULL,
  exclude_vars = NULL,
  group_var,
  force_ordinal = NULL,
  group_order = NULL,
  output_xlsx = NULL,
  output_docx = NULL,
  OR_col = FALSE,
  OR_method = "dynamic",
  consider_normality = "ROBUST",
  print_normality = FALSE,
  show_test = FALSE,
  p_digits = 3,
  round_intg = FALSE,
  smart_rename = TRUE,
  insert_subheads = TRUE,
  factor_order = "mixed",
  table_font_size = 9,
  methods_doc = TRUE,
  methods_filename = "TernTables_methods.docx",
  category_start = NULL,
  manual_italic_indent = NULL,
  manual_underline = NULL,
  indent_info_column = FALSE,
  show_total = TRUE,
  table_caption = NULL,
  table_footnote = NULL,
  line_break_header = getOption("TernTables.line_break_header", TRUE),
  post_hoc = FALSE
)

Arguments

data

Tibble containing all variables.

vars

Character vector of variables to summarize. Defaults to all except group_var and exclude_vars.

exclude_vars

Character vector of variable(s) to exclude. group_var is automatically excluded.

group_var

Character, the grouping variable (factor or character with >=2 levels).

force_ordinal

Character vector of variables to treat as ordinal (i.e., use medians/IQR and nonparametric tests).

group_order

Optional character vector to specify a custom group level order.

output_xlsx

Optional filename to export the table as an Excel file.

output_docx

Optional filename to export the table as a Word document.

OR_col

Logical; if TRUE, adds odds ratios with 95% CI for binary categorical variables (Y/N, YES/NO, or numeric 0/1) and two-level categorical variables (e.g. Male/Female). For two-level categoricals displayed with sub-rows, the reference level (factor level 1, or alphabetical first for non-factors) shows "1.00 (ref.)"; the non-reference level shows the computed OR with 95% CI. Variables with three or more levels show "-". Only valid when group_var has exactly 2 levels; an error is raised for 3+ group comparisons. Default is FALSE.

OR_method

Character; controls how odds ratios are calculated when OR_col = TRUE. If "dynamic" (default), uses Fisher's exact method when any expected cell count is < 5 (Cochran criterion), otherwise uses the Wald method. If "wald", forces the Wald method regardless of expected cell counts.

consider_normality

Character or logical; controls how continuous variables are routed to parametric vs. non-parametric tests. "ROBUST" (default) applies a four-gate decision consistent with standard biostatistical practice: (1) any group n < 3 is a conservative fail-safe to non-parametric; (2) absolute skewness > 2 in any group routes to non-parametric regardless of sample size (catches LOS, counts, etc.); (3) all groups n \geq 30 routes to parametric via the Central Limit Theorem; (4) otherwise Shapiro-Wilk p > 0.05 in all groups routes to parametric. Normal variables use mean \pm SD and Welch t-test (2 groups) or Welch ANOVA (3+ groups); non-normal variables use median [IQR] and Wilcoxon rank-sum (2 groups) or Kruskal-Wallis (3+ groups). If TRUE, uses Shapiro-Wilk alone (p > 0.05 in all groups = normal). Conservative at large n. If FALSE, all numeric variables are treated as normally distributed regardless of distribution. If "FORCE", all numeric variables are treated as non-normal (median [IQR], nonparametric tests).

print_normality

Logical; if TRUE, includes Shapiro-Wilk P values in the output. Default is FALSE.

show_test

Logical; if TRUE, includes the statistical test name as a column in the output. Default is FALSE.

p_digits

Integer; number of decimal places for P values (default 3).

round_intg

Logical; if TRUE, rounds all means, medians, IQRs, and standard deviations to nearest integer (0.5 rounds up). Default is FALSE.

smart_rename

Logical; if TRUE, automatically cleans variable names and subheadings for publication-ready output using built-in rule-based pattern matching for common medical abbreviations and prefixes. Default is TRUE.

insert_subheads

Logical; if TRUE (default), creates a hierarchical structure with a header row and indented sub-category rows for categorical variables with 3 or more levels. Binary variables (Y/N, YES/NO, or numeric 1/0 – which are auto-detected and treated as Y/N) are always displayed as a single row showing the positive/yes count regardless of this setting. Two-level categorical variables whose values are not Y/N, YES/NO, or 1/0 (e.g. Male/Female) use the hierarchical sub-row format, showing both levels as indented rows. If FALSE, all categorical variables use a single-row flat format. Default is TRUE.

factor_order

Character; controls the ordering of factor levels in the output. "mixed" (default) applies level-aware ordering for two-level categorical variables and frequency ordering for variables with three or more levels: for any factor, factor level order is always respected regardless of the number of levels; for non-factor two-level variables (e.g. Male/Female), levels are sorted alphabetically; for non-factor variables with three or more levels, levels are sorted by decreasing frequency. "levels" respects the original factor level ordering for all variables; if the variable is not a factor, falls back to frequency ordering. "frequency" orders all levels by decreasing frequency (most common first).

table_font_size

Numeric; font size for Word document output tables. Default is 9.

methods_doc

Logical; if TRUE (default), generates a methods document describing the statistical tests used.

methods_filename

Character; filename for the methods document. Default is "TernTables_methods.docx".

category_start

Named character vector specifying where to insert category headers. Names are the header label text to display; values are the anchor variable – either the original column name (e.g. "Age_Years") or the cleaned display name (e.g. "Age (yr)"). Both forms are accepted. Example: c("Demographics" = "Age_Years", "Clinical" = "bmi"). Default is NULL (no category headers).

manual_italic_indent

Character vector of display variable names (post-cleaning) that should be formatted as italicized and indented in Word output – matching the appearance of factor sub-category rows. Has no effect on the returned tibble; only applies when output_docx is specified or when the tibble is passed to word_export.

manual_underline

Character vector of display variable names (post-cleaning) that should be formatted as underlined in Word output – matching the appearance of multi-category variable headers. Has no effect on the returned tibble; only applies when output_docx is specified or when the tibble is passed to word_export.

indent_info_column

Logical; if FALSE (default), the internal .indent helper column is dropped from the returned tibble. Set to TRUE to retain it – this is necessary when you intend to post-process the tibble and later pass it to word_export directly, as word_export uses the .indent column to apply correct indentation and italic formatting in the Word table.

show_total

Logical; if TRUE, adds a "Total" column showing the aggregate summary statistic across all groups (e.g., for a publication Table 1 that includes both per-group and overall columns). Default is TRUE.

table_caption

Optional character string for a table caption to display above the table in the Word document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table. Default is NULL (no caption). Example: "Table 2. Comparison of recurrence vs. no recurrence."

table_footnote

Optional character string for a footnote to display below the table in the Word document. Rendered as size 6 Arial italic with a double-bar border above and below. Default is NULL (no footnote).

line_break_header

Logical; if TRUE (default), column headers are wrapped with \n – group names break on spaces, sample size counts move to a second line, and the first column header reads "Category / Variable". Set to FALSE to suppress all header line breaks. Can also be set package-wide via options(TernTables.line_break_header = FALSE).

post_hoc

Logical; if TRUE, runs pairwise post-hoc tests for continuous and ordinal variables in three or more group comparisons and annotates each group column value with a compact letter display (CLD) superscript. Groups sharing a letter are not significantly different at \alpha = 0.05. For normally distributed variables (Welch ANOVA path), Games-Howell pairwise tests are used. For non-normal and ordinal variables (Kruskal-Wallis path), Dunn's test with Holm correction is used. Post-hoc testing is never applied to categorical variables. Only valid when group_var has three or more levels; silently ignored for two-group comparisons. Requires the rstatix package. Default is FALSE.

Value

A tibble with one row per variable (multi-row for multi-level factors), showing summary statistics by group, P values, test type, and optionally odds ratios and total summary column.

Examples

data(tern_colon)

# 2-group comparison
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
      methods_doc = FALSE)

# 2-group comparison with odds ratios
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
      OR_col = TRUE, methods_doc = FALSE)

# 3-group comparison
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Treatment_Arm",
      group_order = c("Observation", "Levamisole", "Levamisole + 5FU"),
      methods_doc = FALSE)

# Export to Word (writes a file to tempdir)

ternG(tern_colon,
      exclude_vars   = c("ID"),
      group_var      = "Recurrence",
      OR_col         = TRUE,
      methods_doc    = FALSE,
      output_docx    = file.path(tempdir(), "comparison.docx"),
      category_start = c("Patient Demographics"  = "Age (yr)",
                         "Tumor Characteristics" = "Positive Lymph Nodes (n)"))



TernTables documentation built on March 26, 2026, 5:09 p.m.