ternD: Generate descriptive summary table (optionally...

View source: R/ternD.R

ternDR Documentation

Generate descriptive summary table (optionally normality-aware)

Description

Creates a descriptive summary table with a single "Total" column format. By default (consider_normality = "ROBUST"), continuous variables are shown as mean +/- SD or median [IQR] based on a four-gate decision (n < 3 fail-safe, skewness, CLT, and Shapiro-Wilk). This can be overridden via consider_normality and force_ordinal.

Usage

ternD(
  data,
  vars = NULL,
  exclude_vars = NULL,
  force_ordinal = NULL,
  output_xlsx = NULL,
  output_docx = NULL,
  consider_normality = "ROBUST",
  print_normality = FALSE,
  round_intg = FALSE,
  smart_rename = TRUE,
  insert_subheads = TRUE,
  factor_order = "mixed",
  methods_doc = TRUE,
  methods_filename = "TernTables_methods.docx",
  category_start = NULL,
  table_font_size = 9,
  manual_italic_indent = NULL,
  manual_underline = NULL,
  table_caption = NULL,
  table_footnote = NULL,
  line_break_header = getOption("TernTables.line_break_header", TRUE)
)

Arguments

data

Tibble with variables.

vars

Character vector of variables to summarize. Defaults to all except exclude_vars.

exclude_vars

Character vector to exclude from the summary.

force_ordinal

Character vector of variables to treat as ordinal (i.e., use median [IQR]) regardless of the consider_normality setting. This parameter takes priority over normality testing when consider_normality = "ROBUST" or TRUE.

output_xlsx

Optional Excel filename to export the table.

output_docx

Optional Word filename to export the table.

consider_normality

Character or logical; controls routing of continuous variables to mean \pm SD vs median [IQR]. "ROBUST" (default) applies a four-gate decision: (1) n < 3 \rightarrow non-parametric (conservative fail-safe); (2) absolute skewness > 2 \rightarrow non-parametric regardless of n; (3) n \geq 30 \rightarrow parametric via the Central Limit Theorem; (4) otherwise Shapiro-Wilk p > 0.05 \rightarrow parametric. If TRUE, uses Shapiro-Wilk alone (can be over-sensitive at large n). If FALSE, defaults to mean \pm SD for all numeric variables unless specified in force_ordinal.

print_normality

Logical; if TRUE, includes Shapiro-Wilk P values as an additional column in the output. Default is FALSE.

round_intg

Logical; if TRUE, rounds all means, medians, IQRs, and standard deviations to nearest integer (0.5 rounds up). Default is FALSE.

smart_rename

Logical; if TRUE, automatically cleans variable names and subheadings for publication-ready output using built-in rule-based pattern matching for common medical abbreviations and prefixes. Default is TRUE.

insert_subheads

Logical; if TRUE (default), creates a hierarchical structure with a header row and indented sub-category rows for categorical variables with 3 or more levels. Binary variables (Y/N, YES/NO, or numeric 1/0 – which are auto-detected and treated as Y/N) are always displayed as a single row showing the positive/yes count regardless of this setting. Two-level categorical variables whose values are not Y/N, YES/NO, or 1/0 (e.g. Male/Female) use the hierarchical sub-row format, showing both levels as indented rows. If FALSE, all categorical variables use a single-row flat format. Default is TRUE.

factor_order

Character; controls the ordering of factor levels in the output. "mixed" (default) applies level-aware ordering for two-level categorical variables and frequency ordering for variables with three or more levels: for any factor, factor level order is always respected regardless of the number of levels; for non-factor two-level variables, levels are sorted alphabetically; for non-factor variables with three or more levels, levels are sorted by decreasing frequency. "levels" respects the original factor level ordering for all variables; if the variable is not a factor, falls back to frequency ordering. "frequency" orders all levels by decreasing frequency (most common first).

methods_doc

Logical; if TRUE (default), generates a methods document describing the statistical presentation used. The document contains boilerplate text for all three table types so the relevant section can be copied directly into a manuscript.

methods_filename

Character; filename for the methods document. Default is "TernTables_methods.docx".

category_start

Named character vector specifying where to insert category headers. Names are the header label text to display; values are the anchor variable – either the original column name (e.g. "Age_Years") or the cleaned display name (e.g. "Age (yr)"). Both forms are accepted. Example: c("Demographics" = "Age_Years", "Clinical Measures" = "bmi"). Default is NULL (no category headers).

table_font_size

Numeric; font size for Word document output tables. Default is 9.

manual_italic_indent

Character vector of display variable names (post-cleaning) that should be formatted as italicized and indented in Word output – matching the appearance of factor sub-category rows. Has no effect on the returned tibble; only applies when output_docx is specified. Default is NULL.

manual_underline

Character vector of display variable names (post-cleaning) that should be formatted as underlined in Word output – matching the appearance of multi-category variable headers. Has no effect on the returned tibble; only applies when output_docx is specified. Default is NULL.

table_caption

Optional character string for a table caption to display above the table in the Word document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table. Default is NULL (no caption). Example: "Table 1. Patient demographics."

table_footnote

Optional character string for a footnote to display below the table in the Word document. Rendered as size 6 Arial italic with a double-bar border above and below. Default is NULL (no footnote).

line_break_header

Logical; if TRUE (default), column headers are wrapped with \n – the first column header includes a category hierarchy label, and the sample size appears on a second line. Set to FALSE to suppress all header line breaks. Can also be set package-wide via options(TernTables.line_break_header = FALSE).

Details

The function always returns a tibble with a single Total (N = n) column format, regardless of the consider_normality setting. The behavior for numeric variables follows this priority:

  1. Variables in force_ordinal: Always use median [IQR]

  2. When consider_normality = "ROBUST": Four-gate decision (n<3 fail-safe, skewness, CLT, Shapiro-Wilk)

  3. When consider_normality = TRUE: Use Shapiro-Wilk test to choose format

  4. When consider_normality = FALSE: Default to mean +/- SD

For categorical variables, the function shows frequencies and percentages. When insert_subheads = TRUE, categorical variables with 3 or more levels are displayed with hierarchical formatting (main variable as header, levels as indented sub-rows). Binary variables (Y/N, YES/NO, or numeric 1/0 auto-detected as Y/N) always use a single-row format showing only the positive/yes count, regardless of this setting. Two-level categorical variables whose values are not Y/N, YES/NO, or 1/0 (e.g. Male/Female) also use the hierarchical sub-row format.

Value

A tibble with one row per variable (multi-row for factors), containing:

Variable

Variable names with appropriate indentation

Total (N = n)

Summary statistics (mean +/- SD, median [IQR], or n (%) as appropriate)

SW_p

Shapiro-Wilk P values (only if print_normality = TRUE)

Examples

data(tern_colon)

# Basic descriptive summary
ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE)

# With normality-aware formatting and category section headers
ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE,
      category_start = c("Patient Demographics"  = "Age (yr)",
                         "Tumor Characteristics" = "Positive Lymph Nodes (n)"))

# Force specific variables to ordinal (median [IQR]) display
ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE,
      force_ordinal = c("Positive_Lymph_Nodes_n"))

# Export to Word (writes a file to tempdir)

ternD(tern_colon,
      exclude_vars     = c("ID"),
      methods_doc      = FALSE,
      output_docx      = file.path(tempdir(), "descriptive.docx"),
      category_start   = c("Patient Demographics"  = "Age (yr)",
                           "Surgical Findings"     = "Colonic Obstruction",
                           "Tumor Characteristics" = "Positive Lymph Nodes (n)",
                           "Outcomes"              = "Recurrence"))


TernTables documentation built on March 26, 2026, 5:09 p.m.