knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) library(TernTables) options(tibble.width = Inf) # show all columns in printed tibbles # Output directory for exported .docx files. # Override by setting options(TernTables.vignette_outdir = "/your/path") before rendering. out_dir <- getOption("TernTables.vignette_outdir", default = tempdir())
```{css, echo = FALSE} img { border: none !important; box-shadow: none !important; }
## Overview **TernTables** is built for clinical researchers who need to go from raw data to a manuscript-ready Word table — with variable detection, statistical test selection, and formatting all handled automatically. Given a data frame and an optional grouping variable, it automatically: - Detects each variable's type (continuous, binary, categorical) - Selects the appropriate statistical test - Formats *P* values and summary statistics for publication-ready tables - Exports directly to a styled `.docx` Word file and generates a boilerplate statistical methods paragraph - Returns a tibble for inspection, Excel export, or further analysis in R Three table types are supported: **descriptive summaries** (single cohort, no comparisons), **two-group comparisons** (with optional odds ratios), and **comparisons across three or more groups**. The convenience is in the automation, not in any compromise to statistical rigor. Test selection follows established published criteria throughout: normality by Shapiro-Wilk per group, Fisher's exact triggered by the Cochran (1954) expected-cell criterion, and odds ratios reported as unadjusted with the first factor level of the grouping variable as the reference. The auto-generated methods paragraph covers the statistical approach used and is suitable as a starting draft for a manuscript methods section. > **No R required?** TernTables is available as a free point-and-click web > application at [tern-tables.com](https://tern-tables.com/). Upload a CSV > or XLSX, configure your table, and download a formatted Word document — > all without writing a line of code. The web app is powered by this package, > so the statistical methods, normality routing, and Word output are identical. > A built-in side panel shows the R commands running in the background and > the full script can be downloaded at the end of your session, making every > analysis fully transparent and reproducible. For scripted or reproducible > workflows, the R package (this vignette) remains the canonical reference. ## Example Dataset ```r data(tern_colon)
tern_colon is bundled with TernTables. It is derived from survival::colon
and contains 929 patients from a landmark colon cancer adjuvant chemotherapy
trial (Moertel et al., 1990), filtered to the recurrence endpoint — one row
per patient. See ?tern_colon for full details.
Key variables used in these examples:
| Column | Description |
|---|---|
| Age_Years | Age at registration (years) |
| Sex | Female / Male |
| Colonic_Obstruction | Colonic obstruction present — n (%) |
| Bowel_Perforation | Bowel perforation present — n (%) |
| Positive_Lymph_Nodes_n | Number of positive lymph nodes |
| Over_4_Positive_Nodes | More than 4 positive lymph nodes — n (%) |
| Tumor_Adherence | Tumour adherence to nearby organs — n (%) |
| Tumor_Differentiation | Well / Moderate / Poor |
| Extent_of_Local_Spread | Depth of tumour penetration (4 levels) |
| Recurrence | No Recurrence / Recurrence — 2-group |
| Treatment_Arm | Levamisole + 5FU / Levamisole / Observation — 3-group |
ternP)If your source is a raw CSV or XLSX file — rather than an already-clean R
object — use ternP() to standardize it before passing it to ternG() or
ternD(). It handles the messiness most commonly introduced by manual data
entry or spreadsheet workflows:
| Transformation | What it fixes |
|---|---|
| String NA conversion | "NA", "na", "Na", "unk" → NA |
| Whitespace trimming | Leading/trailing spaces in character columns |
| Empty column removal | 100% NA columns silently dropped |
| Blank row removal | Rows where every cell is NA |
| Case normalization | "fEMALE" / "Female" unified to title case |
ternP() also applies two hard stops before any cleaning takes place:
it errors immediately if any column name matches a protected health information
(PHI) pattern (e.g. MRN, DOB, FirstName), or if any unnamed column
contains data.
# Load a messy CSV shipped with the package path <- system.file("extdata/csv", "tern_colon_messy.csv", package = "TernTables") raw <- readr::read_csv(path, show_col_types = FALSE) result <- ternP(raw) # The print method fires automatically, summarising every transformation applied.
The printed summary identifies each transformation and shows the final dimensions of the cleaned data. If the data was already clean, a single "No transformations required" line appears.
Three items are returned in the result object:
result$clean_data # Cleaned, analysis-ready tibble result$sparse_rows # Rows with >50% NA (retained, not removed — review these) result$feedback # Named list; NULL elements mean no action was taken
To write a Word document recording the cleaning steps, call
write_cleaning_doc(). It is fully dynamic — only paragraphs for triggered
transformations are written, so the document is concise for already-clean data.
write_cleaning_doc(result, filename = file.path(out_dir, "cleaning_summary.docx"))
Once preprocessing is complete, pass result$clean_data directly to ternD()
or ternG():
tbl <- ternG(result$clean_data, exclude_vars = c("ID"), group_var = "Recurrence")
ternD)Use ternD() for a single cohort with no group comparisons — the standard
"Table 1" in a cohort description. Pass output_docx to write a
publication-ready Word file in the same call; pass output_xlsx to also save
the tibble as an Excel file. Use category_start to insert bold section headers
grouping related variables; anchors can be either the raw column name or the
cleaned display label.
tbl_descriptive <- ternD( data = tern_colon, exclude_vars = c("ID"), output_docx = file.path(out_dir, "Tern_descriptive.docx"), methods_filename = file.path(out_dir, "TernTables_methods.docx"), category_start = c( "Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Outcomes" = "Recurrence" ) ) tbl_descriptive
Continuous variables show mean ± SD or median [IQR] based on the four-gate ROBUST normality algorithm (n < 3 fail-safe, skewness check, CLT at n ≥ 30, Shapiro-Wilk for small samples). Columns whose values are exactly Y/N, YES/NO, or numeric 0/1 are detected as binary and shown as a single n (%) row (the positive/yes count). All other categorical variables — including two-level variables like Male/Female — are shown with each level as an indented sub-row.
Variable names are automatically cleaned for display (smart_rename = TRUE by
default) — underscores replaced with spaces, capitalisation normalised, and
common medical abbreviations formatted (e.g. Age_Years → Age (yr),
Positive_Lymph_Nodes_n → Positive Lymph Nodes (n)). Pass
smart_rename = FALSE to use column names exactly as they appear in the data.
Descriptive summary table exported to Word:
knitr::include_graphics("figures/tern_descriptive.png")
ternG — 2 levels)Use ternG() to compare variables between two groups. Set OR_col = TRUE to
add odds ratios with 95% CI for binary variables (Y/N, YES/NO, 0/1) and
two-level categorical variables such as Male/Female. For two-level categoricals
displayed with sub-rows, the reference level (factor level 1 or alphabetical
first) shows 1.00 (ref.); the non-reference level shows the computed OR with
95% CI. Fisher's exact or Wald is chosen automatically based on expected cell
counts. Pass output_docx to write the Word table directly; output_xlsx
exports the tibble to Excel.
tbl_2group <- ternG( data = tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", output_docx = file.path(out_dir, "Tern_2_group.docx"), methods_filename = file.path(out_dir, "TernTables_methods.docx"), OR_col = TRUE, insert_subheads = TRUE, category_start = c( "Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Treatment Details" = "Treatment Arm" ) ) tbl_2group
The Word table includes an OR column (odds ratio with 95% CI for binary variables) and a P value column (test P value for each variable).
Two-group comparison table exported to Word, with odds ratios and category section headers:
{width=100%}
ternG — 3+ levels)The same ternG() function handles three or more groups automatically,
switching from t-test/Wilcoxon to Welch ANOVA/Kruskal-Wallis as appropriate.
Odds ratios are not available for 3+ group comparisons. consider_normality
controls normality routing; the default ("ROBUST") applies the four-gate
algorithm (n < 3 fail-safe → skewness → CLT → Shapiro-Wilk). FALSE forces parametric tests
throughout; "FORCE" forces nonparametric throughout.
Set post_hoc = TRUE to run pairwise post-hoc tests automatically when the
omnibus P < 0.05. The test is matched to the omnibus test used: Games-Howell
follows Welch ANOVA (parametric path); Dunn’s test with Holm correction
follows Kruskal-Wallis (non-parametric and ordinal path). Results are appended
to each cell as compact letter display (CLD) superscripts — groups sharing a
letter are not significantly different after correction. Categorical variables
never receive post-hoc testing. When post_hoc = TRUE and at least one test
fires, an explanatory footnote is added automatically to the Word output.
tbl_3group <- ternG( data = tern_colon, exclude_vars = c("ID"), group_var = "Treatment_Arm", group_order = c("Observation", "Levamisole", "Levamisole + 5FU"), output_docx = file.path(out_dir, "Tern_3_group.docx"), methods_filename = file.path(out_dir, "TernTables_methods.docx"), consider_normality = "ROBUST", post_hoc = TRUE, category_start = c( "Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Outcomes" = "Recurrence" ) ) tbl_3group
Three-group comparison table exported to Word with category section headers:
{width=100%}
Two optional parameters control text that appears outside the table body in the exported Word document.
table_caption places a bold size-11 Arial caption above the table,
single-spaced with a small gap between the caption and the table:
tbl_descriptive <- ternD( data = tern_colon, exclude_vars = c("ID"), output_docx = file.path(out_dir, "Tern_descriptive.docx"), table_caption = "Table 1. Baseline patient characteristics." )
table_footnote adds a merged footer row below the table in size-6 Arial
italic, bordered above and below by a double rule. Pass a single string or a
character vector for multiple lines (lines are joined with a line break inside
the same cell — no extra row spacing):
tbl_2group <- ternG( data = tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", OR_col = TRUE, output_docx = file.path(out_dir, "Tern_2_group.docx"), table_caption = "Table 2. Characteristics by recurrence status.", table_footnote = c( "Abbreviations: OR, odds ratio; CI, confidence interval.", "\u2020 P values from chi-square or Wilcoxon rank-sum test.", "\u2021 ORs from unadjusted logistic regression." ) )
Both parameters are also stored in the table's metadata and reproduced
automatically when combining tables with ternB().
TernTables selects tests automatically based on variable type and normality:
| Variable type | Test (2 groups) | Test (3+ groups) | Post-hoc (3+ groups, post_hoc = TRUE, omnibus p < 0.05) |
|---|---|---|---|
| Continuous, normal | Welch's t-test | Welch ANOVA | Games-Howell |
| Continuous, non-normal | Wilcoxon rank-sum | Kruskal-Wallis | Dunn's + Holm |
| Binary / Categorical | Fisher's exact or Chi-squared* | Fisher's exact or Chi-squared* | — |
| Ordinal (force_ordinal) | Wilcoxon rank-sum | Kruskal-Wallis | Dunn's + Holm |
*Fisher's exact is used when any expected cell count is < 5 (Cochran criterion). If the exact algorithm cannot complete (workspace limit exceeded for large tables), Fisher's exact with Monte Carlo simulation (B = 10,000; seed fixed via getOption("TernTables.seed"), default 42) is used automatically.
Normality routing uses consider_normality = "ROBUST" (the default) — a
four-gate decision applied per group: (1) any group n < 3 → non-parametric
(conservative fail-safe); (2) absolute skewness > 2 in any group → non-parametric
regardless of sample size; (3) all groups n ≥ 30 → parametric via the Central
Limit Theorem; (4) otherwise Shapiro-Wilk p > 0.05 in all groups → parametric. For 3+ group comparisons,
omnibus P values are reported. When post_hoc = TRUE, pairwise comparisons
are performed automatically for continuous and ordinal variables when omnibus
P < 0.05, using the test paired to the omnibus (Games-Howell or Dunn's +
Holm). CLD superscript letters are appended to cell values; groups sharing a
letter are not significantly different. Categorical variables never receive
post-hoc testing. post_hoc defaults to FALSE.
Set consider_normality = TRUE to use Shapiro-Wilk alone (original behaviour).
A methods paragraph is written automatically with every ternD() and ternG()
call (methods_doc = TRUE by default), saved to "TernTables_methods.docx" in
the working directory unless overridden via methods_filename. Set
methods_doc = FALSE to suppress it.
write_methods_doc() can also be called directly on any saved tibble. Pass
show_test = TRUE to ternG() to populate the test column; when present,
the paragraph is tailored to only the test types that actually appeared (e.g.
omits the t-test sentence if all continuous variables were nonparametric).
Without it, standard boilerplate is used.
write_methods_doc( tbl = tbl_2group, filename = file.path(out_dir, "Tern_methods.docx") )
The full TernTables workflow — preprocessing, descriptive tables, two-group and three-group comparisons, Word export, and methods paragraphs — is available as a free, no-code web application at tern-tables.com. No R or package installation is required. The web app is powered by the same TernTables R package described in this vignette; all statistical methods and outputs are identical.
The web app is transparent by design. A built-in side panel displays the exact R commands being executed in the background as you work, and the full script can be downloaded at the end of your session. The downloaded script runs as-is in R and produces identical output — making every analysis fully auditable and reproducible. This is suitable for submission to statistical reviewers, inclusion in supplemental materials, or IRB documentation, and provides a natural learning path for researchers who want to transition to scripted R workflows. This repository remains the canonical reference for the underlying implementation.
Moertel CG, Fleming TR, Macdonald JS, et al. (1990). Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. New England Journal of Medicine, 322(6), 352–358. https://doi.org/10.1056/NEJM199002083220602
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.