m2dt: Convert Model to Data Table

View source: R/m2dt.R

m2dtR Documentation

Convert Model to Data Table

Description

Extracts coefficients, confidence intervals, and comprehensive model statistics from fitted regression models and converts them to a standardized data.table format suitable for further analysis or publication. This is a core utility function frequently used internally by other summata regression functions, although it can be used as a standalone function as well.

Usage

m2dt(
  data,
  model,
  conf_level = 0.95,
  keep_qc_stats = TRUE,
  include_intercept = TRUE,
  terms_to_exclude = NULL,
  reference_rows = TRUE,
  reference_label = "reference",
  skip_counts = FALSE,
  conf_method = NULL
)

Arguments

data

Data frame or data.table containing the dataset used to fit the model. Required for computing group-level sample sizes and event counts.

model

Fitted model object. Supported classes include:

  • glm - Generalized linear models (logistic, Poisson, etc.)

  • lm - Linear models

  • coxph - Cox proportional hazards models

  • clogit - Conditional logistic regression

  • coxme - Mixed effects Cox models

  • lmerMod - Linear mixed effects models

  • glmerMod - Generalized linear mixed effects models

conf_level

Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% CI).

keep_qc_stats

Logical. If TRUE, includes model quality statistics such as AIC, BIC, R^2, concordance, and model fit tests. These appear as additional columns in the output. Default is TRUE.

include_intercept

Logical. If TRUE, includes the model intercept in output. If FALSE, removes the intercept row from results. Useful for creating cleaner presentation tables. Default is TRUE.

terms_to_exclude

Character vector of term names to exclude from output. Useful for removing specific unwanted parameters (e.g., nuisance variables, spline terms). Default is NULL. Note: If include_intercept = FALSE, "(Intercept)" is automatically added to this list.

reference_rows

Logical. If TRUE, adds rows for reference categories of factor variables with appropriate labels and baseline values (OR/HR = 1, Coefficient = 0). This makes tables more complete and easier to interpret. Default is TRUE.

reference_label

Character string used to label reference category rows in the output. Appears in the reference column. Default is "reference".

skip_counts

Logical. If TRUE, skips computation of group-level sample sizes and event counts (faster but less informative). Default is FALSE.

conf_method

Character string controlling the confidence interval method. If NULL (default), uses getOption("summata.conf_method", "profile").

  • "profile" - Profile likelihood intervals for GLM and negative binomial models (via MASS::confint.glm()), exact t-distribution intervals for linear models. Falls back to Wald on profiling failure. Quasi-likelihood families always use Wald (no true likelihood).

  • "wald" - Wald intervals (coefficient \pm z \times SE) for all model types. Faster but less accurate near boundary conditions or with small subgroups.

Cox and mixed-effects models use Wald intervals regardless of this setting. Set globally with options(summata.conf_method = "wald") to use Wald throughout a session.

Details

This function is the core extraction utility used by fit() and other regression functions. It handles the complexities of different model classes and provides a consistent output format suitable for tables and forest plots.

Model Type Detection: The function automatically detects model type and applies appropriate:

  • Effect measure naming (OR, HR, RR, Coefficient)

  • Confidence interval calculation (see below)

  • Event counting for binary/survival outcomes

Confidence Interval Methods: The CI method is selected per model class using stats::confint() dispatch:

  • GLM/negative binomial: Profile likelihood via MASS::confint.glm(), except quasi-families which use Wald

  • Linear models: Exact t-distribution via confint.lm()

  • Cox PH: Wald intervals (coefficient \pm z \times SE)

  • Mixed-effects models: Wald intervals

Falls back to Wald on profiling failure.

Mixed Effects Models: For lme4 models (glmer, lmer), the function extracts fixed effects only. Random effects variance components are not included in the output table, as they represent clustering structure rather than predictor effects.

Value

A data.table containing extracted model information with the following standard columns:

model_scope

Character. Either "Univariable" (unadjusted model with single predictor) or "Multivariable" (adjusted model with multiple predictors)

model_type

Character. Type of regression (e.g., "Logistic", "Linear", "Cox PH", "Poisson", etc.)

variable

Character. Variable name (for factor variables, the base variable name without the level)

group

Character. Group/level name for factor variables; empty string for continuous variables

n

Integer. Total sample size used in the model

n_group

Integer. Sample size for this specific variable level (factor variables only)

events

Integer. Total number of events in the model (for survival and logistic models)

events_group

Integer. Number of events for this specific variable level (for survival and logistic models with factor variables)

coefficient

Numeric. Raw regression coefficient (log odds, log hazard, etc.)

se

Numeric. Standard error of the coefficient

OR/HR/RR/Coefficient

Numeric. Effect estimate - column name depends on model type:

  • OR for logistic regression (odds ratio)

  • HR for Cox models (hazard ratio)

  • RR for Poisson regression (rate/risk ratio)

  • Coefficient for linear models or other GLMs

ci_lower

Numeric. Lower bound of confidence interval for effect estimate

ci_upper

Numeric. Upper bound of confidence interval for effect estimate

statistic

Numeric. Test statistic (z-value for GLM/Cox, t-value for LM)

p_value

Numeric. p-value for coefficient test

sig

Character. Significance markers: *** (p < 0.001), ** (p < 0.01), * (p < 0.05), . (p < 0.10).

sig_binary

Logical. Binary indicator: TRUE if p < 0.05, FALSE otherwise

reference

Character. Contains reference_label for reference category rows when reference_rows = TRUE, empty string otherwise

See Also

fit for the main regression interface, glmforest, coxforest, lmforest for forest plot visualization

Examples

# Load example data
data(clintrial)

# Example 1: Extract from logistic regression
glm_model <- glm(os_status ~ age + sex + treatment, 
                 data = clintrial, family = binomial)

glm_result <- m2dt(clintrial, glm_model)
glm_result



# Example 2: Extract from linear model
lm_model <- lm(los_days ~ age + sex + surgery, data = clintrial)

lm_result <- m2dt(clintrial, lm_model)
lm_result

# Example 3: Cox proportional hazards model
library(survival)
cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + stage,
                   data = clintrial)

cox_result <- m2dt(clintrial, cox_model)
cox_result

# Example 4: Exclude intercept for cleaner tables
clean_result <- m2dt(clintrial, glm_model, include_intercept = FALSE)
clean_result

# Example 5: Change confidence level
result_90ci <- m2dt(clintrial, glm_model, conf_level = 0.90)
result_90ci




summata documentation built on May 7, 2026, 5:07 p.m.