m2dt: Convert Model to Data Table
In summata: Publication-Ready Summary Tables and Forest Plots

View source: R/m2dt.R

m2dt	R Documentation

Convert Model to Data Table

Description

Extracts coefficients, confidence intervals, and comprehensive model statistics from fitted regression models and converts them to a standardized data.table format suitable for further analysis or publication. This is a core utility function frequently used internally by other summata regression functions, although it can be used as a standalone function as well.

Usage

m2dt(
  data,
  model,
  conf_level = 0.95,
  keep_qc_stats = TRUE,
  include_intercept = TRUE,
  terms_to_exclude = NULL,
  reference_rows = TRUE,
  reference_label = "reference",
  skip_counts = FALSE,
  conf_method = NULL
)

Arguments

`data`	Data frame or data.table containing the dataset used to fit the model. Required for computing group-level sample sizes and event counts.
`model`	Fitted model object. Supported classes include: `glm` - Generalized linear models (logistic, Poisson, etc.) `lm` - Linear models `coxph` - Cox proportional hazards models `clogit` - Conditional logistic regression `coxme` - Mixed effects Cox models `lmerMod` - Linear mixed effects models `glmerMod` - Generalized linear mixed effects models
`conf_level`	Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% CI).
`keep_qc_stats`	Logical. If `TRUE`, includes model quality statistics such as AIC, BIC, R`^2`, concordance, and model fit tests. These appear as additional columns in the output. Default is `TRUE`.
`include_intercept`	Logical. If `TRUE`, includes the model intercept in output. If `FALSE`, removes the intercept row from results. Useful for creating cleaner presentation tables. Default is `TRUE`.
`terms_to_exclude`	Character vector of term names to exclude from output. Useful for removing specific unwanted parameters (e.g., nuisance variables, spline terms). Default is `NULL`. Note: If `include_intercept = FALSE`, "(Intercept)" is automatically added to this list.
`reference_rows`	Logical. If `TRUE`, adds rows for reference categories of factor variables with appropriate labels and baseline values (OR/HR = 1, Coefficient = 0). This makes tables more complete and easier to interpret. Default is `TRUE`.
`reference_label`	Character string used to label reference category rows in the output. Appears in the `reference` column. Default is `"reference"`.
`skip_counts`	Logical. If `TRUE`, skips computation of group-level sample sizes and event counts (faster but less informative). Default is `FALSE`.
`conf_method`	Character string controlling the confidence interval method. If `NULL` (default), uses `getOption("summata.conf_method", "profile")`. `"profile"` - Profile likelihood intervals for GLM and negative binomial models (via `MASS::confint.glm()`), exact t-distribution intervals for linear models. Falls back to Wald on profiling failure. Quasi-likelihood families always use Wald (no true likelihood). `"wald"` - Wald intervals (coefficient `\pm` z `\times` SE) for all model types. Faster but less accurate near boundary conditions or with small subgroups. Cox and mixed-effects models use Wald intervals regardless of this setting. Set globally with `options(summata.conf_method = "wald")` to use Wald throughout a session.

Details

This function is the core extraction utility used by fit() and other regression functions. It handles the complexities of different model classes and provides a consistent output format suitable for tables and forest plots.

Model Type Detection: The function automatically detects model type and applies appropriate:

Effect measure naming (OR, HR, RR, Coefficient)
Confidence interval calculation (see below)
Event counting for binary/survival outcomes

Confidence Interval Methods: The CI method is selected per model class using stats::confint() dispatch:

GLM/negative binomial: Profile likelihood via MASS::confint.glm(), except quasi-families which use Wald
Linear models: Exact t-distribution via confint.lm()
Cox PH: Wald intervals (coefficient \pm z \times SE)
Mixed-effects models: Wald intervals

Falls back to Wald on profiling failure.

Mixed Effects Models: For lme4 models (glmer, lmer), the function extracts fixed effects only. Random effects variance components are not included in the output table, as they represent clustering structure rather than predictor effects.

Value

A data.table containing extracted model information with the following standard columns:

model_scope

Character. Either "Univariable" (unadjusted model with single predictor) or "Multivariable" (adjusted model with multiple predictors)

model_type

Character. Type of regression (e.g., "Logistic", "Linear", "Cox PH", "Poisson", etc.)

variable

Character. Variable name (for factor variables, the base variable name without the level)

group

Character. Group/level name for factor variables; empty string for continuous variables

n

Integer. Total sample size used in the model

n_group

Integer. Sample size for this specific variable level (factor variables only)

events

Integer. Total number of events in the model (for survival and logistic models)

events_group

Integer. Number of events for this specific variable level (for survival and logistic models with factor variables)

coefficient

Numeric. Raw regression coefficient (log odds, log hazard, etc.)

se

Numeric. Standard error of the coefficient

OR/HR/RR/Coefficient

Numeric. Effect estimate - column name depends on model type:

OR for logistic regression (odds ratio)
HR for Cox models (hazard ratio)
RR for Poisson regression (rate/risk ratio)
Coefficient for linear models or other GLMs

ci_lower

Numeric. Lower bound of confidence interval for effect estimate

ci_upper

Numeric. Upper bound of confidence interval for effect estimate

statistic

Numeric. Test statistic (z-value for GLM/Cox, t-value for LM)

p_value

Numeric. p-value for coefficient test

sig

Character. Significance markers: *** (p < 0.001), ** (p < 0.01), * (p < 0.05), . (p < 0.10).

sig_binary

Logical. Binary indicator: TRUE if p < 0.05, FALSE otherwise

reference

Character. Contains reference_label for reference category rows when reference_rows = TRUE, empty string otherwise

Examples

# Load example data
data(clintrial)

# Example 1: Extract from logistic regression
glm_model <- glm(os_status ~ age + sex + treatment, 
                 data = clintrial, family = binomial)

glm_result <- m2dt(clintrial, glm_model)
glm_result



# Example 2: Extract from linear model
lm_model <- lm(los_days ~ age + sex + surgery, data = clintrial)

lm_result <- m2dt(clintrial, lm_model)
lm_result

# Example 3: Cox proportional hazards model
library(survival)
cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + stage,
                   data = clintrial)

cox_result <- m2dt(clintrial, cox_model)
cox_result

# Example 4: Exclude intercept for cleaner tables
clean_result <- m2dt(clintrial, glm_model, include_intercept = FALSE)
clean_result

# Example 5: Change confidence level
result_90ci <- m2dt(clintrial, glm_model, conf_level = 0.90)
result_90ci

summata documentation built on May 7, 2026, 5:07 p.m.