survtable: Create Publication-Ready Survival Summary Tables

View source: R/survtable.R

survtableR Documentation

Create Publication-Ready Survival Summary Tables

Description

Generates comprehensive survival summary tables with survival probabilities at specified time points, median survival times, and optional group comparisons with statistical testing. Designed for creating survival summaries commonly used in clinical and epidemiological research publications.

Usage

survtable(
  data,
  outcome,
  by = NULL,
  times = NULL,
  probs = 0.5,
  stats = c("survival", "ci"),
  type = "survival",
  conf_level = 0.95,
  conf_type = "log",
  digits = 0,
  time_digits = 1,
  p_digits = 3,
  percent = TRUE,
  test = TRUE,
  test_type = "logrank",
  total = TRUE,
  total_label = "Total",
  time_unit = NULL,
  time_label = NULL,
  median_label = NULL,
  labels = NULL,
  by_label = NULL,
  na_rm = TRUE,
  number_format = NULL,
  ...
)

Arguments

data

Data frame or data.table containing the survival dataset. Automatically converted to a data.table for efficient processing.

outcome

Character string or character vector specifying one or more survival outcomes using Surv() syntax (e.g., "Surv(os_months, os_status)"). When multiple outcomes are provided, results are stacked into a single table with outcome labels as row headers.

by

Character string specifying the column name of the stratifying variable for group comparisons (e.g., treatment arm, risk group). When NULL (default), produces overall survival summaries only.

times

Numeric vector of time points at which to estimate survival probabilities. For example, c(12, 24, 36) for 1-, 2-, and 3-year survival when time is measured in months. Default is NULL.

probs

Numeric vector of survival probabilities for which to estimate corresponding survival times (quantiles). Values must be between 0 and 1. For example, c(0.5) returns median survival time, c(0.25, 0.5, 0.75) returns quartiles. Default is 0.5 (median only).

stats

Character vector specifying which statistics to display:

  • "survival" - Survival probability at specified times

  • "ci" - Confidence interval for survival probability

  • "n_risk" - Number at risk at each time point

  • "n_event" - Cumulative number of events by each time point

Default is c("survival", "ci").

type

Character string specifying the type of probability to report:

  • "survival" - Survival probability S(t) [default]

  • "risk" - Cumulative incidence/risk 1 - S(t)

  • "cumhaz" - Cumulative hazard -log(S(t))

conf_level

Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals).

conf_type

Character string specifying the confidence interval type for survival estimates:

  • "log" - Log transformation (default, recommended)

  • "log-log" - Log-log transformation

  • "plain" - Linear/identity (can produce CIs outside [0, 1])

  • "logit" - Logit transformation

  • "arcsin" - Arcsin square root transformation

digits

Integer specifying the number of decimal places for survival probabilities (as percentages). Default is 0 (whole percentages).

time_digits

Integer specifying the number of decimal places for survival time estimates (median, quantiles). Default is 1.

p_digits

Integer specifying the number of decimal places for p-values. Values smaller than 10^(-p_digits) are displayed as "< 0.001" (for p_digits = 3), "< 0.0001" (for p_digits = 4), etc. Default is 3.

percent

Logical. If TRUE (default), displays survival probabilities as percentages (e.g., "85%"). If FALSE, displays as proportions (e.g., "0.85").

test

Logical. If TRUE (default), performs a survival curve comparison test and adds a p-value column. Requires by to be specified.

test_type

Character string specifying the statistical test for comparing survival curves:

  • "logrank" - Log-rank test (default)

  • "wilcoxon" - Wilcoxon (Breslow) test

  • "tarone" - Tarone-Ware test

  • "petopeto" - Peto-Peto test

total

Logical or character string controlling the total/overall column:

  • TRUE or "first" - Include total column first [default]

  • "last" - Include total column last (before p-value)

  • FALSE - Exclude total column

total_label

Character string for the total/overall row label. Default is "Total".

time_unit

Character string specifying the time unit for display in column headers and labels (e.g., "months", "days", "years"). When specified, time column headers become "{time} {time_unit}" (e.g., "12 months"). Default is NULL (no unit shown).

time_label

Character string template for time column headers when times is specified. Use "\{time\}" as placeholder for the time value and "\{unit\}" for the time unit. Default is "\{time\} \{unit\}" when time_unit is specified, otherwise just "\{time\}".

median_label

Character string for the median survival row label. Default is NULL, which auto-constructs from conf_level (e.g., "Median (95% CI)" for conf_level = 0.95).

labels

Named character vector or list providing custom display labels. For stratified analyses, names should match levels of the by variable. For multiple outcomes, names should match the Surv() expressions. Default is NULL.

by_label

Character string providing a custom label for the stratifying variable (used in output attributes and headers). Default is NULL (uses variable name).

na_rm

Logical. If TRUE (default), observations with missing values in time, status, or the stratifying variable are excluded.

number_format

Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:

  • "us" - Comma thousands, period decimal: 1,234.56 [default]

  • "eu" - Period thousands, comma decimal: 1.234,56

  • "space" - Thin-space thousands, period decimal: 1 234.56 (SI/ISO 31-0)

  • "none" - No thousands separator: 1234.56

Or provide a custom two-element vector c(big.mark, decimal.mark), e.g., c("'", ".") for Swiss-style: ⁠1'234.56⁠.

When NULL (default), uses getOption("summata.number_format", "us"). Set the global option once per session to avoid passing this argument repeatedly:

    options(summata.number_format = "eu")
  
...

Additional arguments passed to survfit.

Details

Survival Probability Estimation:

Survival probabilities are estimated using the Kaplan-Meier method via survfit. At each specified time point, the function reports the estimated probability of surviving beyond that time.

Confidence Intervals:

The default "log" transformation for confidence intervals is recommended as it ensures intervals remain within [0, 1] and has good statistical properties. The "log-log" transformation is also commonly used and may perform better in the tails.

Statistical Testing:

The log-rank test (default) tests the null hypothesis that survival curves are identical across groups. Alternative tests weight different parts of the survival curve:

  • Log-rank: Equal weights (best for proportional hazards)

  • Wilcoxon: Weights by number at risk (sensitive to early differences)

  • Tarone-Ware: Weights by square root of number at risk

  • Peto-Peto: Modified Wilcoxon weights

Formatting:

All numeric output respects the number_format parameter. Separators within confidence intervals adapt automatically to avoid ambiguity:

  • Survival probabilities: "85% (80%-89%)" (US) or "85% (80%-89%)" (EU, en-dash separator)

  • Median survival: "24.5 (21.2-28.9)" (US) or "24,5 (21,2-28,9)" (EU)

  • Counts \ge 1000: "1,234" (US) or "1.234" (EU)

  • p-values: "< 0.001" (US) or "< 0,001" (EU)

Value

A data.table with S3 class "survtable" containing formatted survival statistics. The table structure depends on parameters:

When times is specified (survival at time points):

Variable/Group

Row identifier – stratifying variable levels

Time columns

Survival statistics at each requested time point

p-value

Test p-value (if test = TRUE and by specified)

When only probs is specified (survival quantiles):

Variable/Group

Row identifier – stratifying variable levels

Quantile columns

Time to reach each survival probability

p-value

Test p-value (if test = TRUE and by specified)

All numeric output (probabilities, times, counts, p-values) respects the number_format setting for locale-appropriate formatting.

The returned object includes the following attributes:

raw_data

Data.table with unformatted numeric values

survfit_objects

List of survfit objects for each stratum

by_variable

The stratifying variable name

times

The time points requested

probs

The probability quantiles requested

test_result

Full test result object (if test performed)

See Also

desctable for baseline characteristics tables, fit for regression analysis, table2pdf for PDF export, table2docx for Word export, survfit for underlying survival estimation, survdiff for survival curve comparison tests

Other descriptive functions: desctable(), print.survtable()

Examples

# Load example data
data(clintrial)

# Example 1: Survival at specific time points by treatment
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24, 36),
    time_unit = "months"
)



# Example 2: Median survival only
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = NULL,
    probs = 0.5
)

# Example 3: Multiple quantiles (quartiles)
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "stage",
    times = NULL,
    probs = c(0.25, 0.5, 0.75)
)

# Example 4: Both time points and median
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    probs = 0.5,
    time_unit = "months"
)

# Example 5: Cumulative incidence (1 - survival)
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    type = "risk"
)

# Example 6: Include number at risk
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    stats = c("survival", "ci", "n_risk")
)

# Example 7: Overall survival without stratification
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    times = c(12, 24, 36, 48)
)

# Example 8: Without total row
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    total = FALSE
)

# Example 9: Custom labels
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    labels = c("Drug A" = "Treatment A", "Drug B" = "Treatment B"),
    time_unit = "months"
)

# Example 10: Different confidence interval type
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    conf_type = "log-log"
)

# Example 11: Wilcoxon test instead of log-rank
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    test_type = "wilcoxon"
)

# Example 12: Access raw data for custom analysis
result <- survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24)
)
raw <- attr(result, "raw_data")
print(raw)

# Example 13: Access survfit objects for plotting
fits <- attr(result, "survfit_objects")
plot(fits$overall)  # Plot overall survival curve

# Example 14: Multiple survival outcomes stacked
survtable(
    data = clintrial,
    outcome = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"),
    by = "treatment",
    times = c(12, 24),
    probs = 0.5,
    time_unit = "months",
    total = FALSE,
    labels = c(
        "Surv(pfs_months, pfs_status)" = "Progression-Free Survival",
        "Surv(os_months, os_status)" = "Overall Survival"
    )
)

# Example 15: European number formatting
survtable(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    by = "treatment",
    times = c(12, 24),
    number_format = "eu"
)




summata documentation built on May 7, 2026, 5:07 p.m.