Advanced Workflows

## Use ragg for better font rendering if available
if (requireNamespace("ragg", quietly = TRUE)) {
  old_opts <- options(summata.use_ragg = TRUE, width = 180)
  knitr::opts_chunk$set(
    dev = "ragg_png",
    fig.retina = 1,
    collapse = TRUE,
    comment = "##>",
    message = FALSE,
    warning = FALSE,
    fig.width = 8,
    fig.height = 5,
    out.width = "100%"
  )
} else {
  old_opts <- options(width = 180)
  knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "##>",
    message = FALSE,
    warning = FALSE,
    fig.width = 8,
    fig.height = 5,
    out.width = "100%"
  )
}

## Dynamic figure sizing: queue_plot() stashes rec_dims from a plot object,
## and the opts_hook on the NEXT chunk (with use_rec_dims = TRUE) applies them
## before knitr opens the graphics device. Plots render via ragg (dev = "ragg_png"
## set above) and knitr captures them natively. No files written to disk.
.plot_dims <- new.env(parent = emptyenv())
.plot_dims$width <- NULL
.plot_dims$height <- NULL

knitr::opts_hooks$set(use_rec_dims = function(options) {
  if (isTRUE(options$use_rec_dims)) {
    if (!is.null(.plot_dims$width))  options$fig.width  <- .plot_dims$width
    if (!is.null(.plot_dims$height)) options$fig.height <- .plot_dims$height
    .plot_dims$width <- NULL
    .plot_dims$height <- NULL
  }
  options
})

## Call at the end of a plot-creation chunk to stash dimensions for the next chunk.
queue_plot <- function(plot) {
  dims <- attr(plot, "rec_dims")
  if (!is.null(dims)) {
    .plot_dims$width  <- dims$width
    .plot_dims$height <- dims$height
  }
  invisible(plot)
}

Standard regression analysis assumes independent observations and constant effects across subgroups. In practice, these assumptions often fail: observations may be clustered (e.g., subjects within study sites), effects may vary by subgroup (interaction), or the baseline hazard may differ across strata. This vignette demonstrates advanced analytical techniques to address these complexities.


Preliminaries

The examples in this vignette use the clintrial dataset included with summata:

library(summata)
library(survival)
library(ggplot2)

data(clintrial)
data(clintrial_labels)

# Examine the clustering structure
table(clintrial$site)

The clintrial dataset includes 10 study sites, providing a natural clustering variable for hierarchical analysis.

n.b.: To ensure correct font rendering and figure sizing, the forest plots below are displayed using a helper function (queue_plot()) that applies each plot's recommended dimensions (stored in the "rec_dims" attribute) via the ragg graphics device. In practice, replace queue_plot() with ggplot2::ggsave() using recommended plot dimensions for equivalent results:

r p <- glmforest(model, data = mydata) dims <- attr(p, "rec_dims") ggplot2::ggsave("forest_plot.png", p, width = dims$width, height = dims$height)

This ensures that the figure size is always large enough to accommodate the constituent plot text and graphics, and it is generally the preferred method for saving forest plot outputs in summata.


Interaction Effects

Interaction analysis tests whether the association between a predictor and outcome varies across levels of another variable. Interactions are specified using colon notation in the interactions parameter; main effects are automatically included alongside the interaction term.

Example 1: Two-Way Interaction

Specify interactions using colon notation in the interactions parameter:

example1 <- fit(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "treatment", "stage"),
  interactions = c("sex:treatment"),
  model_type = "glm",
  labels = clintrial_labels
)

example1

Example 2: Multiple Interactions

Test multiple effect modifiers simultaneously:

example2 <- fit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  predictors = c("age", "sex", "treatment", "stage"),
  interactions = c("sex:treatment", "stage:treatment"),
  model_type = "coxph",
  labels = clintrial_labels
)

example2

Example 3: Continuous × Categorical Interaction

Test whether effects of a continuous variable vary by group:

example3 <- fit(
  data = clintrial,
  outcome = "los_days",
  predictors = c("age", "sex", "treatment", "stage", "surgery"),
  interactions = c("age:treatment"),
  model_type = "lm",
  labels = clintrial_labels
)

example3

Example 4: Interaction in fullfit()

Include interaction terms in combined univariable/multivariable analysis:

example4 <- fullfit(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "treatment", "stage", "sex:treatment"),
  model_type = "glm",
  method = "all",
  labels = clintrial_labels
)

example4

Example 5: Testing Interaction Significance

Use compfit() to formally evaluate whether interactions improve model fit (see Model Comparison):

example5 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Main Effects" = c("age", "sex", "treatment", "stage"),
    "Sex × Treatment" = c("age", "sex", "treatment", "stage"),
    "Stage × Treatment" = c("age", "sex", "treatment", "stage"),
    "Both Interactions" = c("age", "sex", "treatment", "stage")
  ),
  interactions_list = list(
    NULL,
    c("sex:treatment"),
    c("stage:treatment"),
    c("sex:treatment", "stage:treatment")
  ),
  model_type = "glm",
  labels = clintrial_labels
)

example5

Example 6: Forest Plot with Interactions

Visualize models including interaction terms:

interaction_model <- fit(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "treatment", "stage"),
  interactions = c("sex:treatment"),
  model_type = "glm",
  labels = clintrial_labels
)

example6 <- glmforest(
  x = attr(interaction_model, "model"),
  title = "Logistic Regression with Interaction",
  labels = clintrial_labels,
  indent_groups = TRUE,
  zebra_stripes = TRUE
)
queue_plot(example6)
print(example6)

Mixed-Effects Models

Mixed-effects models (multilevel or hierarchical models) account for correlation within clusters by incorporating random effects. The summata package supports linear mixed models (lmer), generalized linear mixed models (glmer), and Cox mixed models (coxme).

Random effects are specified using pipe notation within either the predictors parameter or in a separate random parameter:

| Syntax | Meaning | |:-------|:--------| | (1\|group) | Random intercepts by group | | (x\|group) | Random intercepts and slopes for x | | (1\|g1) + (1\|g2) | Crossed random effects |

Example 7: Random Intercepts (Linear Mixed)

The (1|site) syntax specifies a random intercept for each site:

example7 <- fit(
  data = clintrial,
  outcome = "los_days",
  predictors = c("age", "sex", "treatment", "stage", "(1|site)"),
  model_type = "lmer",
  labels = clintrial_labels
)

example7

Example 8: Random Intercepts (Logistic Mixed)

For binary outcomes with clustering (note the use of an independent random parameter):

example8 <- fit(
    data = clintrial,
    outcome = "surgery",
    predictors = c("age", "sex", "treatment", "stage"),
    random = "(1|site)",
    model_type = "glmer",
    labels = clintrial_labels
)

example8

Example 9: Random Slopes

Allow effects to vary by cluster:

example9 <- fit(
  data = clintrial,
  outcome = "los_days",
  predictors = c("age", "sex", "treatment", "stage", "(1 + treatment|site)"),
  model_type = "lmer",
  labels = clintrial_labels
)

example9

Example 10: Cox Mixed-Effects Model

For survival outcomes with clustering:

example10 <- fit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  predictors = c("age", "sex", "treatment", "stage", "(1|site)"),
  model_type = "coxme",
  labels = clintrial_labels
)

example10

Example 11: Forest Plot from Mixed Model

Visualize fixed effects from mixed-effects models:

example11 <- glmforest(
  x = attr(example8, "model"),
  title = "Logistic Mixed Model (Fixed Effects)",
  labels = clintrial_labels,
  indent_groups = TRUE,
  zebra_stripes = TRUE
)
queue_plot(example11)
print(example11)

Example 12: Comparing Random-Effects Specifications

Use compfit() to compare different random-effects structures (see Model Comparison):

example12 <- compfit(
  data = clintrial,
  outcome = "los_days",
  model_list = list(
    "Random Intercepts" = c("age", "sex", "treatment", "stage", "(1|site)"),
    "Random Slopes" = c("age", "sex", "treatment", "stage", "(1 + treatment|site)")
  ),
  model_type = "lmer",
  labels = clintrial_labels
)

example12

Stratification and Clustering

In addition to mixed-effects models, Cox models support stratification (separate baseline hazards) and clustered standard errors (robust variance estimation).

| Approach | When to Use | |:---------|:------------| | Mixed-effects | Model cluster-specific effects; prediction at cluster level | | Stratification | Proportional hazards assumption violated for a variable | | Clustered SE | Correlation within clusters; robust inference |

Example 13: Stratified Cox Model

Stratification allows nonproportional hazards across strata without estimating stratum effects. Use the strata parameter:

example13 <- fit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  predictors = c("age", "sex", "treatment"),
  strata = "site",
  model_type = "coxph",
  labels = clintrial_labels
)

example13

Example 14: Cluster-Robust SEs

For robust inference with correlated observations, cluster-robust standard errors account for within-cluster correlation. Use the cluster parameter:

example14 <- fit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  predictors = c("age", "sex", "treatment", "stage"),
  cluster = "site",
  model_type = "coxph",
  labels = clintrial_labels
)

example14

Weighted Regression

For survey data or inverse probability weighting, use the weights parameter. Weights can account for sampling design, non-response, or confounding adjustment.

Example 15: Weighted Analysis

# Create example weights
clintrial$analysis_weight <- runif(nrow(clintrial), 0.5, 2.0)

example15 <- fit(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "treatment", "stage"),
  weights = "analysis_weight",
  model_type = "glm",
  labels = clintrial_labels
)

example15

Advanced Univariable Screening Features

The uniscreen() function supports advanced model specifications including random effects and stratification.

Example 16: Univariable Screening with Random Effects

Screen multiple predictors while accounting for clustering:

example16 <- uniscreen(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "treatment", "stage"),
  model_type = "glmer",
  random = "(1|site)",
  labels = clintrial_labels
)

example16

Example 17: Cox Screening with Mixed Effects

For survival outcomes with clustering:

example17 <- uniscreen(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  predictors = c("age", "sex", "treatment", "stage", "ecog"),
  model_type = "coxme",
  random = "(1|site)",
  labels = clintrial_labels
)

example17

Example 18: Forest Plot from Univariable Screening

The uniforest() function visualizes univariable screening results:

uni_results <- uniscreen(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  predictors = c("age", "sex", "treatment", "stage", "ecog", "grade"),
  model_type = "coxph",
  labels = clintrial_labels
)

example18 <- uniforest(
  uni_results,
  title = "Univariable Screening Results",
  indent_groups = TRUE,
  zebra_stripes = TRUE
)
queue_plot(example18)
print(example18)

Advanced Multivariate Regression Features

The multifit() function supports interactions and mixed-effects models when testing a predictor across multiple outcomes.

Example 19: Multi-Outcome with Interactions

Test effect modification across multiple outcomes:

example19 <- multifit(
  data = clintrial,
  outcomes = c("surgery", "pfs_status", "os_status"),
  predictor = "treatment",
  covariates = c("age", "sex", "stage"),
  interactions = c("treatment:sex"),
  labels = clintrial_labels,
  parallel = FALSE
)

example19

Example 20: Multi-Outcome with Mixed Effects

Account for clustering when testing effects across outcomes:

example20 <- multifit(
  data = clintrial,
  outcomes = c("surgery", "pfs_status"),
  predictor = "treatment",
  covariates = c("age", "sex"),
  random = "(1|site)",
  model_type = "glmer",
  labels = clintrial_labels,
  parallel = FALSE
)

example20

Example 21: Survival Multi-Outcome with Mixed Effects

For multiple survival outcomes with clustering:

example21 <- multifit(
  data = clintrial,
  outcomes = c("Surv(pfs_months, pfs_status)",
               "Surv(os_months, os_status)"),
  predictor = "treatment",
  covariates = c("age", "sex"),
  random = "(1|site)",
  model_type = "coxme",
  labels = clintrial_labels,
  parallel = FALSE
)

example21

Complete Workflow Example

The following demonstrates a complete advanced analysis workflow:

# Step 1: Screen risk factors for primary outcome
risk_screening <- uniscreen(
  data = clintrial,
  outcome = "os_status",
  predictors = c("age", "sex", "bmi", "smoking", "diabetes",
                 "hypertension", "stage", "ecog", "treatment"),
  model_type = "glm",
  p_threshold = 0.20,
  labels = clintrial_labels
)

risk_screening

# Step 2: Test key exposure across multiple outcomes
effects <- multifit(
  data = clintrial,
  outcomes = c("surgery", "pfs_status", "os_status"),
  predictor = "treatment",
  covariates = c("age", "sex", "stage"),
  columns = "both",
  labels = clintrial_labels,
  parallel = FALSE
)

effects

# Step 3: Visualize effects
forest_plot <- multiforest(
  effects,
  title = "Effects Across Outcomes",
  indent_predictor = TRUE,
  zebra_stripes = TRUE
)
queue_plot(forest_plot)
print(forest_plot)

Best Practices

Interaction Analysis

  1. Include main effects for all variables in interactions
  2. Test formally using interaction p-values
  3. Non-significant interactions do not prove effect homogeneity
  4. Consider domain plausibility when interpreting

Mixed-Effects Models

  1. Use when data exhibits natural clustering
  2. Start with random intercepts; add random slopes if justified
  3. Monitor convergence; simplify if necessary
  4. Compare to fixed-effects models using compfit()

Method Selection

| Scenario | Recommended Approach | |:---------|:---------------------| | Clustered observations | Mixed-effects models | | Robust inference needed | Clustered standard errors | | PH assumption violated | Stratification | | Effect modification | Interaction terms | | Survey data | Weighted regression |

Sample Size Considerations


Common Issues

Convergence Problems in Mixed Models

Simplify the random effects structure if convergence fails:

# Start with random intercepts only
fit(data, outcome, c(predictors, "(1|site)"), model_type = "lmer")

Check for common problems such as few observations per cluster, near-zero variance in random effects, or highly correlated predictors.

Interpreting Interactions

For categorical × categorical interactions, the coefficient represents the additional effect beyond the sum of main effects:

# Access model for detailed interpretation
model <- attr(result, "model")
summary(model)

Forest Plots with Interactions

Forest plots display all terms including interactions. For complex interaction patterns, consider stratified analyses or effect plots.

options(old_opts)

Further Reading



Try the summata package in your browser

Any scripts or data that you put into this service are public.

summata documentation built on May 7, 2026, 5:07 p.m.