Model Comparison

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE,
  dpi = 150
)

# Use ragg for better font rendering if available
if (requireNamespace("ragg", quietly = TRUE)) {
  knitr::opts_chunk$set(dev = "ragg_png")
}

old_opts <- options(width = 180)

Model selection is a fundamental problem in applied statistics. Given a set of candidate models, it is important to determine which specification best balances goodness-of-fit against parsimony. Traditional approaches rely on information criteria (AIC, BIC), discrimination metrics (C-statistic), and hypothesis tests for nested models.

The compfit() function synthesizes these metrics into a weighted Composite Model Score (CMS) to facilitate systematic comparison between models. Like other functions in this package, it follows the standard summata input paradigm, with an additional parameter to allow for multiple models:

compfit(data, outcome, model_list, model_type, ...)

where data is the dataset, outcome is the common endpoint variable, model_list is a named list of predictor vectors (each representing a different model to be compared), and model_type is the type of model to be implemented. This vignette demonstrates the various capabilities of this function using the included sample dataset.


Preliminaries

The examples in this vignette use the clintrial dataset included with summata:

library(summata)

data(clintrial)
data(clintrial_labels)

Quality Metrics and the Composite Model Score

The compfit() function evaluates models using several quality metrics, then combines them into a single Composite Model Score (CMS) ranging from 0 to 100 for rapid comparison. The metrics available depend on the model type.

Available Metrics

| Metric | Used by | Interpretation | Better | |:-------|:--------|:---------------|:-------| | AIC | All | Information criterion balancing fit and complexity | Lower | | BIC | Fallback | Information criterion with stronger complexity penalty | Lower | | Concordance | GLM, Cox, GLMER, Cox ME | Discrimination ability (0.5 = random, 1.0 = perfect) | Higher | | Pseudo-R² | GLM, Cox ME | Proportion of variation explained (McFadden's) | Higher | | R² | LM | Proportion of variance explained | Higher | | Brier Score | GLM | Prediction accuracy for binary outcomes (0 = perfect) | Lower | | Global p | Cox | Omnibus likelihood ratio test p-value | Lower | | RMSE | LM | Root mean squared error of residuals | Lower | | Marginal R² | LMER, GLMER | Variance explained by fixed effects alone | Higher | | Conditional R² | LMER | Variance explained by fixed + random effects | Higher | | ICC | LMER, GLMER, Cox ME | Intraclass correlation; proportion of variance from clustering | Moderate | | Convergence | All | Whether the model converged successfully | Yes |

CMS Interpretation

| Range | Interpretation | |:------|:---------------| | 85–100 | Excellent | | 75–84 | Very Good | | 65–74 | Good | | 55–64 | Fair | | < 55 | Poor |

Default CMS Weights by Model Type

The score components and their default weights vary by model type.

| Component | LM | GLM | Cox | LMER | GLMER | Cox ME | |:----------|:--:|:---:|:---:|:----:|:-----:|:------:| | Convergence | 15% | 15% | 15% | 20% | 15% | 15% | | AIC | 25% | 25% | 30% | 25% | 25% | 30% | | R² | 45% | | | | | | | Pseudo-R² | | 15% | | | | 10% | | Concordance | | 40% | 40% | | 30% | 35% | | Brier Score | | 5% | | | | | | Global p | | | 15% | | | | | RMSE | 15% | | | | | | | Marginal R² | | | | 25% | 15% | | | Conditional R² | | | | 15% | | | | ICC | | | | 15% | 15% | 10% |


Basic Usage

Example 1: Nested Model Comparison

Compare models with increasing complexity:

example1 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Demographics" = c("age", "sex"),
    "Plus Stage" = c("age", "sex", "stage", "ecog"),
    "Full Model" = c("age", "sex", "stage", "ecog", "treatment", "smoking")
  ),
  model_type = "glm"
)

example1

Example 2: Linear Regression Models

For continuous outcomes, use model_type = "lm":

example2 <- compfit(
  data = clintrial,
  outcome = "los_days",
  model_list = list(
    "Simple" = c("age", "sex"),
    "Disease" = c("age", "sex", "stage", "ecog"),
    "Treatment" = c("age", "sex", "stage", "ecog", "surgery", "treatment")
  ),
  model_type = "lm"
)

example2

Example 3: Cox Regression Models

For time-to-event outcomes, use model_type = "coxph":

example3 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Unadjusted" = c("treatment"),
    "Demographics" = c("treatment", "age", "sex"),
    "Full" = c("treatment", "age", "sex", "stage", "ecog")
  ),
  model_type = "coxph"
)

example3

Example 4: Count Models

For count outcomes, use model_type = "glm" with a count model family (e.g., family = "poisson"):

example4 <- compfit(
  data = clintrial,
  outcome = "fu_count",
  model_list = list(
    "Minimal" = c("age", "treatment"),
    "Clinical" = c("age", "treatment", "stage", "ecog"),
    "Full" = c("age", "treatment", "stage", "ecog", "surgery", "diabetes")
  ),
  model_type = "glm",
  family = "poisson",
  labels = clintrial_labels
)

example4

Interaction Testing

A key application of compfit() is testing whether interaction terms improve model fit.

Example 5: Single Interaction

Compare a main-effects model to one with an interaction term:

example5 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Main Effects" = c("age", "treatment", "sex"),
    "With Interaction" = c("age", "treatment", "sex")
  ),
  interactions_list = list(
    NULL,
    c("sex:treatment")
  ),
  model_type = "glm"
)

example5

Example 6: Multiple Interactions

Compare different interaction hypotheses:

example6 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Main Effects" = c("age", "treatment", "sex", "stage"),
    "Age × Treatment" = c("age", "treatment", "sex", "stage"),
    "Sex × Treatment" = c("age", "treatment", "sex", "stage"),
    "Both" = c("age", "treatment", "sex", "stage")
  ),
  interactions_list = list(
    NULL,
    c("age:treatment"),
    c("sex:treatment"),
    c("age:treatment", "sex:treatment")
  ),
  model_type = "coxph"
)

example6

Accessing Detailed Results

Example 7: Coefficient Comparison

Setting include_coefficients = TRUE generates a table comparing coefficients across models:

example7 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Model A" = c("age", "sex"),
    "Model B" = c("age", "sex", "stage"),
    "Model C" = c("age", "sex", "stage", "treatment")
  ),
  model_type = "glm",
  include_coefficients = TRUE,
  labels = clintrial_labels
)

# Main comparison
example7

# Coefficient table
coef_table <- attr(example7, "coefficients")
coef_table

Example 8: Fitted Model Objects

The underlying model objects are stored as attributes for further analysis:

models <- attr(example7, "models")
names(models)

# Examine a specific model
summary(models[["Model C"]])

Example 9: Recommended Model

The best-performing model is identified automatically based on the Composite Model Score (CMS):

example9 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Minimal" = c("treatment"),
    "Standard" = c("treatment", "age", "sex", "stage"),
    "Extended" = c("treatment", "age", "sex", "stage", "ecog", "grade")
  ),
  model_type = "coxph"
)

recommended <- attr(example9, "best_model")
cat("Recommended model:", recommended, "\n")

Custom CMS Weights

Example 10: Custom Weights

Modify the CMS scoring weights to emphasize specific metrics:

example10 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Simple" = c("age", "sex"),
    "Standard" = c("age", "sex", "stage"),
    "Full" = c("age", "sex", "stage", "treatment", "ecog")
  ),
  model_type = "glm",
  scoring_weights = list(
    convergence = 0.05,
    aic = 0.20,
    concordance = 0.60,
    pseudo_r2 = 0.10,
    brier = 0.05
  )
)

example10

Application Scenarios

Scenario 1: Confounder Assessment

Evaluate the impact of covariate adjustment on effect estimates:

scenario1 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Crude" = c("treatment"),
    "Age-Sex Adjusted" = c("treatment", "age", "sex"),
    "Fully Adjusted" = c("treatment", "age", "sex", "stage", "ecog")
  ),
  model_type = "coxph",
  include_coefficients = TRUE,
  labels = clintrial_labels
)

scenario1

# Compare treatment effect across models
attr(scenario1, "coefficients")

Scenario 2: Variable Selection Validation

Compare automated versus theory-driven selection:

# Identify candidates via screening
screening <- uniscreen(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "bmi", "smoking", "diabetes",
                 "hypertension", "stage", "ecog", "treatment"),
  model_type = "glm",
  p_threshold = 0.10
)

# Extract significant predictors
sig_vars <- attr(screening, "significant")

scenario2 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Theory-Driven" = c("age", "sex", "stage", "treatment"),
    "Data-Driven" = sig_vars,
    "Combined" = unique(c("age", "sex", "stage", "treatment", sig_vars))
  ),
  model_type = "glm"
)

scenario2

Scenario 3: Parsimony Assessment

Test whether additional predictors meaningfully improve fit:

scenario3 <- compfit(
  data = clintrial,
  outcome = "los_days",
  model_list = list(
    "3 Predictors" = c("age", "surgery", "ecog"),
    "5 Predictors" = c("age", "surgery", "ecog", "stage", "treatment"),
    "8 Predictors" = c("age", "surgery", "ecog", "stage", "treatment",
                       "sex", "smoking", "diabetes")
  ),
  model_type = "lm",
  labels = clintrial_labels
)

scenario3

Exporting Results

Comparison tables can be exported to various formats:

# Main comparison table
table2docx(
  table = example1,
  file = file.path(tempdir(), "Model_Comparison.docx"),
  caption = "Table 3. Model Comparison Results"
)

# Coefficient table
table2docx(
  table = attr(example6, "coefficients"),
  file = file.path(tempdir(), "Coefficient_Comparison.docx"),
  caption = "Table S1. Coefficient Estimates Across Models"
)

# PDF with landscape orientation for wide tables
table2pdf(
  table = example1,
  file = file.path(tempdir(), "Model_Comparison.pdf"),
  caption = "Model Comparison",
  orientation = "landscape"
)

Best Practices

When to Compare Models

  1. Covariate selection: Determine appropriate adjustment level
  2. Interaction testing: Evaluate effect modification
  3. Parsimony: Balance complexity against fit
  4. Sensitivity analysis: Compare different specifications

Interpreting Results

  1. Multiple metrics: The Composite Model Score (CMS) provides a summary, but examine individual metrics
  2. Practical significance: Statistical improvement may not translate to meaningful differences
  3. Sample size: Complex models require larger samples
  4. Convergence: Non-converged models should be interpreted cautiously

Limitations

  1. The Composite Model Score (CMS) is a heuristic, not a formal statistical test
  2. Comparisons assume models are fit on identical observations
  3. Information criteria are most meaningful for nested models
  4. Small score differences may not be practically significant

Common Issues

Non-Convergence

Check convergence status and consider simplifying non-converging models:

# Check convergence status
comparison[, .(Model, Converged)]

# For non-converging models:
# 1. Reduce complexity
# 2. Check for separation (logistic)
# 3. Examine predictor correlations

When encountering non-converging models, consider reducing complexity, examining predictor correlations, and checking for perfect separation (primarily in logistic models).

Differing Sample Sizes

Models with missing data may have different sample sizes, which complicates comparison:

# Check sample sizes
comparison[, .(Model, N, Events)]

# Use complete cases for fair comparison
complete_data <- na.omit(data[, relevant_vars, with = FALSE])

Interpreting Close Scores

When scores are similar, prefer parsimony and consider domain knowledge:

# Examine individual metrics
comparison[, .(Model, `Composite Model Score (CMS)`, AIC, Concordance)]

# Prefer parsimony when scores are close
# Consider interpretability
options(old_opts)

Further Reading



Try the summata package in your browser

Any scripts or data that you put into this service are public.

summata documentation built on May 7, 2026, 5:07 p.m.