compfit: Compare Multiple Regression Models
In summata: Publication-Ready Summary Tables and Forest Plots

compfit

R Documentation

Compare Multiple Regression Models

Description

Fits multiple regression models and provides a comprehensive comparison table with model quality metrics, convergence diagnostics, and selection guidance. Computes a composite score combining multiple quality metrics to facilitate rapid model comparison and selection.

Usage

compfit(
  data,
  outcome,
  model_list,
  model_names = NULL,
  interactions_list = NULL,
  random = NULL,
  model_type = "auto",
  family = "binomial",
  conf_level = 0.95,
  p_digits = 3,
  include_coefficients = FALSE,
  scoring_weights = NULL,
  labels = NULL,
  number_format = NULL,
  verbose = NULL,
  ...
)

Arguments

`data`	Data frame or data.table containing the dataset.
`outcome`	Character string specifying the outcome variable. For survival analysis, use `Surv()` syntax (e.g., `"Surv(time, status)"`).
`model_list`	List of character vectors, each containing predictor names for one model. Can also be a single character vector to auto-generate nested models.
`model_names`	Character vector of names for each model. If `NULL`, uses "Model 1", "Model 2", etc. Default is `NULL`.
`interactions_list`	List of character vectors specifying interaction terms for each model. Each element corresponds to one model in model_list. Use `NULL` for models without interactions. Use colon notation for interactions (e.g., `c("age:treatment")`). If `NULL`, no interactions are added to any model. Default is `NULL`.
`random`	Character string specifying the random-effects formula for mixed-effects models (`glmer`, `lmer`, `coxme`). Use standard `lme4`/`coxme` syntax, e.g., `"(1\|site)"` for random intercepts by site. This random effects formula is applied to all models in the comparison. Alternatively, random effects can be included directly in the predictor vectors within `model_list` using the same syntax, which allows different random effects structures across models. Default is `NULL`.
`model_type`	Character string specifying model type. If `"auto"`, detects based on outcome. Options include: `"auto"` - Automatically detect based on outcome type (default) `"glm"` - Generalized linear model. Supports logistic, Poisson, Gamma, Gaussian via `family` parameter. `"lm"` - Linear regression for continuous outcomes `"coxph"` - Cox proportional hazards for survival analysis `"negbin"` - Negative binomial regression for overdispersed counts (requires MASS package) `"lmer"` - Mixed-effects linear regression for clustered continuous outcomes `"glmer"` - Mixed-effects logistic regression for clustered categorical outcomes `"coxme"` - Mixed-effects Cox regression for clustered time-to-event outcomes
`family`	For GLM and GLMER models, specifies the error distribution and link function. Common options include: `"binomial"` - Logistic regression for binary outcomes (default) `"poisson"` - Poisson regression for count data `"quasibinomial"` - Logistic with overdispersion `"quasipoisson"` - Poisson with overdispersion `"gaussian"` - Normal distribution (linear regression via GLM) `"Gamma"` - Gamma for positive continuous data `"inverse.gaussian"` - For positive, highly skewed data For negative binomial, use `model_type = "negbin"` instead. See `family` for all options.
`conf_level`	Numeric confidence level for intervals. Default is 0.95.
`p_digits`	Integer specifying the number of decimal places for p-values. Values smaller than `10^(-p_digits)` are displayed as `"< 0.001"` (for `p_digits = 3`), `"< 0.0001"` (for `p_digits = 4`), etc. Default is 3.
`include_coefficients`	Logical. If TRUE, includes a second table with coefficient estimates. Default is FALSE.
`scoring_weights`	Named list of scoring weights. Each weight should be between 0 and 1, and they should sum to 1. Available metrics depend on model type. If `NULL`, uses sensible defaults. See Details for available metrics.
`labels`	Named character vector providing custom display labels for variables. Default is `NULL`.
`number_format`	Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets: `"us"` - Comma thousands, period decimal: `1,234.56` [default] `"eu"` - Period thousands, comma decimal: `1.234,56` `"space"` - Thin-space thousands, period decimal: `1 234.56` (SI/ISO 31-0) `"none"` - No thousands separator: `1234.56` Or provide a custom two-element vector `c(big.mark, decimal.mark)`, e.g., `c("'", ".")` for Swiss-style: `⁠1'234.56⁠`. When `NULL` (default), uses `getOption("summata.number_format", "us")`. Set the global option once per session to avoid passing this argument repeatedly: options(summata.number_format = "eu")
`verbose`	Logical. If `TRUE`, displays model fitting warnings (e.g., singular fit, convergence issues). If `FALSE` (default), routine fitting messages are suppressed while unexpected warnings are preserved. When `NULL`, uses `getOption("summata.verbose", FALSE)`.
`...`	Additional arguments passed to model fitting functions.

Details

This function fits all specified models and computes comprehensive quality metrics for comparison. It generates a Composite Model Score (CMS) that combines multiple metrics: lower AIC/BIC (information criteria), higher concordance (discrimination), and model convergence status.

For GLMs, McFadden's pseudo-R-squared is calculated as 1 - (logLik/logLik_null). For survival models, the global p-value comes from the log-rank test.

Models that fail to converge are flagged and penalized in the composite score.

Interaction Terms:

When interactions_list is provided, each element specifies the interaction terms for the corresponding model in model_list. This is particularly useful for testing whether adding interactions improves model fit:

Use NULL for models without interactions
Specify interactions using colon notation: c("age:treatment", "sex:stage")
Main effects for all variables in interactions must be in the predictor list
Common pattern: Compare main effects model vs model with interactions

Scoring weights can be customized based on model type:

GLM: "convergence", "aic", "concordance", "pseudo_r2", "brier"
Cox: "convergence", "aic", "concordance", "global_p"
Linear: "convergence", "aic", "pseudo_r2", "rmse"

Default weights emphasize discrimination (concordance) and model fit (AIC).

The composite score is designed as a tool to quickly rank models by their quality metrics. It should be used alongside traditional model selection criteria rather than as a definitive model selection method.

Value

A data.table with class "compfit_result" containing:

Model: Model name/identifier
CMS: Composite Model Score for model selection (higher is better)
N: Sample size
Events: Number of events (for survival/logistic)
Predictors: Number of predictors
Converged: Whether model converged properly
AIC: Akaike Information Criterion
BIC: Bayesian Information Criterion
R^2 / Pseudo-R^2: McFadden pseudo-R-squared (GLM)
Concordance: C-statistic (logistic/survival)
Brier Score: Brier accuracy score (logistic)
Global p: Overall model p-value

Attributes include:

models: List of fitted model objects
coefficients: Coefficient comparison table (if requested)
best_model: Name of recommended model

Examples

# Load example data
data(clintrial)
data(clintrial_labels)

# Example 1: Compare nested logistic regression models
models <- list(
    base = c("age", "sex"),
    clinical = c("age", "sex", "smoking", "diabetes"),
    full = c("age", "sex", "smoking", "diabetes", "stage", "ecog")
)

comparison <- compfit(
    data = clintrial,
    outcome = "os_status",
    model_list = models,
    model_names = c("Base", "Clinical", "Full")
)
comparison



# Example 2: Compare Cox survival models
library(survival)
surv_models <- list(
    simple = c("age", "sex"),
    clinical = c("age", "sex", "stage", "grade")
)

surv_comparison <- compfit(
    data = clintrial,
    outcome = "Surv(os_months, os_status)",
    model_list = surv_models,
    model_type = "coxph"
)
surv_comparison

# Example 3: Test effect of adding interaction terms
interaction_models <- list(
    main = c("age", "treatment", "sex"),
    interact = c("age", "treatment", "sex")
)

interaction_comp <- compfit(
    data = clintrial,
    outcome = "os_status",
    model_list = interaction_models,
    model_names = c("Main Effects", "With Interaction"),
    interactions_list = list(
        NULL,
        c("treatment:sex")
    )
)
interaction_comp

# Example 4: Include coefficient comparison table
detailed <- compfit(
    data = clintrial,
    outcome = "os_status",
    model_list = models,
    include_coefficients = TRUE,
    labels = clintrial_labels
)

# Access coefficient table
coef_table <- attr(detailed, "coefficients")
coef_table

# Example 5: Access fitted model objects
fitted_models <- attr(comparison, "models")
names(fitted_models)

# Example 6: Get best model recommendation
best <- attr(comparison, "best_model")
cat("Recommended model:", best, "\n")

summata documentation built on May 7, 2026, 5:07 p.m.