Multivariate Regression
In summata: Publication-Ready Summary Tables and Forest Plots

## Use ragg for better font rendering if available
if (requireNamespace("ragg", quietly = TRUE)) {
  old_opts <- options(summata.use_ragg = TRUE, width = 180)
  knitr::opts_chunk$set(
    dev = "ragg_png",
    fig.retina = 1,
    collapse = TRUE,
    comment = "##>",
    message = FALSE,
    warning = FALSE,
    fig.width = 8,
    fig.height = 5,
    out.width = "80%"
  )
} else {
  old_opts <- options(width = 180)
  knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "##>",
    message = FALSE,
    warning = FALSE,
    fig.width = 8,
    fig.height = 5,
    out.width = "80%"
  )
}

## Dynamic figure sizing: queue_plot() stashes rec_dims from a plot object,
## and the opts_hook on the NEXT chunk (with use_rec_dims = TRUE) applies them
## before knitr opens the graphics device. Plots render via ragg (dev = "ragg_png"
## set above) and knitr captures them natively. No files written to disk.
.plot_dims <- new.env(parent = emptyenv())
.plot_dims$width <- NULL
.plot_dims$height <- NULL

knitr::opts_hooks$set(use_rec_dims = function(options) {
  if (isTRUE(options$use_rec_dims)) {
    if (!is.null(.plot_dims$width))  options$fig.width  <- .plot_dims$width
    if (!is.null(.plot_dims$height)) options$fig.height <- .plot_dims$height
    .plot_dims$width <- NULL
    .plot_dims$height <- NULL
  }
  options
})

## Call at the end of a plot-creation chunk to stash dimensions for the next chunk.
queue_plot <- function(plot) {
  dims <- attr(plot, "rec_dims")
  if (!is.null(dims)) {
    .plot_dims$width  <- dims$width
    .plot_dims$height <- dims$height
  }
  invisible(plot)
}

Multivariate regression refers to the simultaneous examination of a single independent predictor across multiple dependent variables. This approach inverts the typical regression paradigm: rather than testing multiple predictors against one outcome (as in uniscreen(), fit(), and fullfit()), multivariate regression tests the predictive value of a individual factor against multiple outcomes, with or without multivariable adjustment for other independent factors. This design is particularly useful when a key exposure or intervention must be evaluated against several endpoints simultaneously.

The multifit() function implements this workflow, supporting all model types available in summata with optional covariate adjustment, interaction terms, and mixed-effects specifications. Results can be visualized using multiforest() and exported using standard table export functions.

As with other summata functions, this function adheres to the standard calling convention:

multifit(data, outcomes, predictor, covariates, model_type, ...)

where data is the dataset, outcomes is a vector of screened outcome variables, predictor is the predicting variable under examination, covariates is a vector of simultaneous independent predicting variables (i.e., potential confounders), and model_type is the type of model to be implemented. This vignette demonstrates the various capabilities of this function using the included sample dataset.

Preliminaries

The examples in this vignette use the clintrial dataset included with summata:

library(summata)
library(survival)
library(ggplot2)

data(clintrial)
data(clintrial_labels)

The clintrial dataset includes multiple outcome types suitable for multivariate analysis:

| Outcome Type | Variables | Model | |:-------------|:----------|:------| | Continuous | los_days, pain_score, recovery_days | lm | | Binary | any_complication, wound_infection, readmission_30d, icu_admission | glm (binomial) | | Count (equidispersed) | fu_count | glm (poisson) | | Count (overdispersed) | ae_count | negbin or quasipoisson | | Time-to-event | Surv(pfs_months, pfs_status), Surv(os_months, os_status) | coxph |

n.b.: To ensure correct font rendering and figure sizing, the forest plots below are displayed using a helper function (queue_plot()) that applies each plot's recommended dimensions (stored in the "rec_dims" attribute) via the ragg graphics device. In practice, replace queue_plot() with ggplot2::ggsave() using recommended plot dimensions for equivalent results:

r p <- glmforest(model, data = mydata) dims <- attr(p, "rec_dims") ggplot2::ggsave("forest_plot.png", p, width = dims$width, height = dims$height)

This ensures that the figure size is always large enough to accommodate the constituent plot text and graphics, and it is generally the preferred method for saving forest plot outputs in summata.

Basic Usage

Example 1: Unadjusted Analysis

Test a single predictor against multiple binary outcomes:

example1 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection",
               "readmission_30d", "icu_admission"),
  predictor = "surgery",
  labels = clintrial_labels,
  parallel = FALSE
)

example1

Each row shows the effect of the predictor on a specific outcome. For categorical predictors, each non-reference level appears separately.

Example 2: Adjusted Analysis

Include covariates for confounding control:

example2 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection",
               "readmission_30d", "icu_admission"),
  predictor = "surgery",
  covariates = c("age", "sex", "smoking", "diabetes"),
  labels = clintrial_labels,
  parallel = FALSE
)

example2

Example 3: Unadjusted and Adjusted Comparison

Use columns = "both" to display unadjusted and adjusted results side-by-side:

example3 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection",
               "readmission_30d", "icu_admission"),
  predictor = "surgery",
  covariates = c("age", "sex", "diabetes", "surgery"),
  columns = "both",
  labels = clintrial_labels,
  parallel = FALSE
)

example3

Comparing columns reveals confounding (large differences) or robust associations (similar estimates).

Predictor Types

Example 4: Continuous Predictors

For continuous predictors, one row appears per outcome:

example4 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection", "icu_admission"),
  predictor = "age",
  covariates = c("sex", "treatment", "surgery"),
  labels = clintrial_labels,
  parallel = FALSE
)

example4

The effect estimate represents the change in log-odds per one-unit increase in the predictor.

Example 5: Multilevel Categorical Predictors

For categorical predictors with multiple levels, the display is expanded:

example5 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection",
               "readmission_30d", "icu_admission"),
  predictor = "treatment",
  covariates = c("age", "sex", "surgery"),
  labels = clintrial_labels,
  parallel = FALSE
)

example5

Model Types

Example 6: Cox Regression

For time-to-event outcomes, use model_type = "coxph":

example6 <- multifit(
  data = clintrial,
  outcomes = c("Surv(pfs_months, pfs_status)",
               "Surv(os_months, os_status)"),
  predictor = "treatment",
  covariates = c("age", "sex", "stage"),
  model_type = "coxph",
  labels = clintrial_labels,
  parallel = FALSE
)

example6

Example 7: Linear Regression

For continuous outcomes, use model_type = "lm":

example7 <- multifit(
  data = clintrial,
  outcomes = c("los_days", "pain_score", "recovery_days"),
  predictor = "treatment",
  covariates = c("age", "sex", "surgery"),
  model_type = "lm",
  labels = clintrial_labels,
  parallel = FALSE
)

example7

Example 8: Mixed-Effects Models

Account for clustering using random effects:

example8 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection"),
  predictor = "treatment",
  covariates = c("age", "sex"),
  random = "(1|site)",
  model_type = "glmer",
  labels = clintrial_labels,
  parallel = FALSE
)

example8

Advanced Features

Example 9: Interaction Terms

Interaction terms can be added to test effect modification:

example9 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection"),
  predictor = "treatment",
  covariates = c("age", "sex"),
  interactions = c("treatment:sex"),
  labels = clintrial_labels,
  parallel = FALSE
)

example9

Example 10: Filtering by p-value

Outputs can be filtered to retain only significant associations:

example10 <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection",
               "readmission_30d", "icu_admission"),
  predictor = "treatment",
  covariates = c("age", "sex", "surgery"),
  p_threshold = 0.01,
  labels = clintrial_labels,
  parallel = FALSE
)

example10

Example 11: Accessing Model Objects

Underlying model objects are stored as attributes for additional analysis:

result <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection"),
  predictor = "treatment",
  covariates = c("age", "sex"),
  labels = clintrial_labels,
  keep_models = TRUE,
  parallel = FALSE
)

# Access individual models
models <- attr(result, "models")
names(models)

# Examine a specific model
summary(models[["any_complication"]])

Forest Plot Visualization

The multiforest() function creates forest plots from multifit() results.

Example 12: Forest Plot for Logistic (Binary) Outcomes

Results from multifit() can be directly inserted into the multiforest() function to generate a forest plot:

result <- multifit(
  data = clintrial,
  outcomes = c("any_complication", "wound_infection",
               "readmission_30d", "icu_admission"),
  predictor = "treatment",
  covariates = c("age", "sex", "diabetes", "surgery"),
  labels = clintrial_labels,
  parallel = FALSE
)

example12 <- multiforest(
  result,
  title = "Treatment Effects Across Outcomes",
  indent_predictor = TRUE,
  zebra_stripes = TRUE
)
queue_plot(example12)

print(example12)

Example 13: Customization Options

Customize the appearance of the forest plot by adjusting function parameters:

example13 <- multiforest(
  result,
  title = "Effect Estimates",
  column = "adjusted",
  show_predictor = FALSE,
  covariates_footer = TRUE,
  table_width = 0.65,
  color = "#4BA6B6"
)
queue_plot(example13)

print(example13)

Example 14: Forest Plot for Continuous Outcomes

Create forest plots for continuous outcomes:

lm_result <- multifit(
  data = clintrial,
  outcomes = c("pain_score", "recovery_days", "los_days"),
  predictor = "treatment",
  covariates = c("age", "sex", "surgery"),
  model_type = "lm",
  parallel = FALSE
)

example14 <- multiforest(
  lm_result,
  title = "Treatment Effects on Recovery Metrics",
  show_predictor = FALSE,
  covariates_footer = TRUE,
  labels = clintrial_labels
)
queue_plot(example14)

print(example14)

Example 15: Forest Plot for Survival Outcomes

The multiforest() function supports multiple model_type outputs from multifit(). In addition, variable labels can be applied to multiforest() outputs similarly to other forest plot functions:

cox_result <- multifit(
  data = clintrial,
  outcomes = c("Surv(pfs_months, pfs_status)",
               "Surv(os_months, os_status)"),
  predictor = "treatment",
  covariates = c("age", "sex", "stage"),
  model_type = "coxph",
  parallel = FALSE
)

example15 <- multiforest(
  cox_result,
  title = "Treatment Effects on Survival Outcomes",
  indent_predictor = TRUE,
  zebra_stripes = TRUE,
  labels = clintrial_labels
)
queue_plot(example15)

print(example15)

Exporting Results

Tables

Export tables using standard export functions:

table2docx(
  table = result,
  file = file.path(tempdir(), "multioutcome_analysis.docx"),
  caption = "Treatment Effects Across Outcomes"
)

table2pdf(
  table = result,
  file = file.path(tempdir(), "multioutcome_analysis.pdf"),
  caption = "Treatment Effects Across Outcomes"
)

Forest Plots

Save forest plots using ggsave():

p <- multiforest(result, title = "Effect Estimates")
dims <- attr(p, "rec_dims")

ggsave(file.path(tempdir(), "multioutcome_forest.pdf"), p,
       width = attr(result, "rec_dims")$width,
       height = attr(result, "rec_dims")$height, 
       units = "in")

Complete Workflow Example

The following demonstrates a complete multi-outcome analysis workflow:

## Define outcomes by type
binary_outcomes <- c("any_complication", "wound_infection",
                     "readmission_30d", "icu_admission")
survival_outcomes <- c("Surv(pfs_months, pfs_status)",
                       "Surv(os_months, os_status)")

## Unadjusted screening
unadjusted <- multifit(
  data = clintrial,
  outcomes = binary_outcomes,
  predictor = "treatment",
  labels = clintrial_labels,
  parallel = FALSE
)

unadjusted

## Adjusted analysis with comparison
adjusted <- multifit(
  data = clintrial,
  outcomes = binary_outcomes,
  predictor = "treatment",
  covariates = c("age", "sex", "diabetes", "surgery"),
  columns = "both",
  labels = clintrial_labels,
  parallel = FALSE
)

adjusted

## Forest plot visualization
forest_plot <- multiforest(
  adjusted,
  title = "Treatment Effect Estimates",
  column = "adjusted",
  indent_predictor = TRUE,
  zebra_stripes = TRUE,
  table_width = 0.65,
  labels = clintrial_labels
)
queue_plot(forest_plot)

print(forest_plot)

Parameter Reference

| Parameter | Description | |:----------|:------------| | outcomes | Character vector of outcome variable names | | predictor | Single predictor variable name | | covariates | Optional adjustment covariates | | interactions | Interaction terms (colon notation) | | random | Random effects formula for mixed models | | strata | Stratification variable (Cox models) | | cluster | Clustering variable for robust standard errors | | model_type | "glm", "lm", "coxph", "glmer", "lmer", "coxme" | | family | GLM family (default: "binomial") | | columns | "adjusted", "unadjusted", or "both" | | p_threshold | Filter results by p-value | | labels | Custom variable labels | | parallel | Enable parallel processing |

Best Practices

Outcome Selection

Use compatible outcome types within a single analysis
Group conceptually related outcomes
Consider multiple testing adjustments when testing many outcomes

Adjustment Strategy

Pre-specify covariates based on domain knowledge
Use columns = "both" to assess confounding
Apply consistent covariates across outcomes for comparability

Interpretation

Focus on effect magnitude and precision, not only p-values
Look for consistent patterns across related outcomes
Consider practical significance alongside statistical significance

Common Issues

Empty Results

If results are empty, verify:

Predictor variable exists and has variation
Outcomes are correctly specified (Surv() syntax for survival)
Model type matches outcome type

Convergence Warnings

For mixed-effects models, simplify the random effects structure:

## Start with random intercepts only
multifit(data, outcomes, predictor,
         random = "(1|site)",
         model_type = "glmer")

Many Factor Levels

For predictors with many levels, consider collapsing categories:

clintrial$treatment_binary <- ifelse(clintrial$treatment == "Control", 
                                      "Control", "Active")

Other Considerations

Multivariate Regression vs. Univariable Screening

The distinction between multifit() and uniscreen() is important:

| Function | Tests | Use Case | |:---------|:------|:---------| | uniscreen() | Multiple predictors → One outcome | Variable screening, risk factor identification | | multifit() | One predictor → Multiple outcomes | Exposure effects, intervention evaluation |

All outcomes in a single multifit() call should be of the same type (all binary, all continuous, or all survival). Mixing outcome types produces tables with incompatible effect measures. The function validates outcome compatibility and issues a warning when mixed types are detected.

Categorical Outcomes with More Than Two Levels

multifit() supports binary outcomes (via logistic regression) but not multinomial or ordinal outcomes. If a categorical outcome with three or more levels is included (e.g., treatment group with "Control", "Drug A", "Drug B"), the function will issue a warning because binomial GLM coerces such variables to binary (first level vs. all others).

For multilevel categorical outcomes, use dedicated packages outside of summata:

## Multinomial regression (unordered categories)
library(nnet)
model <- multinom(treatment ~ age + sex + stage, data = clintrial)

## Ordinal regression (ordered categories)
library(MASS)
model <- polr(grade ~ age + sex + stage, data = clintrial, Hess = TRUE)

options(old_opts)

summata Publication-Ready Summary Tables and Forest Plots

Multivariate Regression In summata: Publication-Ready Summary Tables and Forest Plots