## Use ragg for better font rendering if available if (requireNamespace("ragg", quietly = TRUE)) { old_opts <- options(summata.use_ragg = TRUE, width = 180) knitr::opts_chunk$set( dev = "ragg_png", fig.retina = 1, collapse = TRUE, comment = "##>", message = FALSE, warning = FALSE, fig.width = 8, fig.height = 5, out.width = "100%" ) } else { old_opts <- options(width = 180) knitr::opts_chunk$set( collapse = TRUE, comment = "##>", message = FALSE, warning = FALSE, fig.width = 8, fig.height = 5, out.width = "100%" ) } ## Dynamic figure sizing: queue_plot() stashes rec_dims from a plot object, ## and the opts_hook on the NEXT chunk (with use_rec_dims = TRUE) applies them ## before knitr opens the graphics device. Plots render via ragg (dev = "ragg_png" ## set above) and knitr captures them natively. No files written to disk. .plot_dims <- new.env(parent = emptyenv()) .plot_dims$width <- NULL .plot_dims$height <- NULL knitr::opts_hooks$set(use_rec_dims = function(options) { if (isTRUE(options$use_rec_dims)) { if (!is.null(.plot_dims$width)) options$fig.width <- .plot_dims$width if (!is.null(.plot_dims$height)) options$fig.height <- .plot_dims$height .plot_dims$width <- NULL .plot_dims$height <- NULL } options }) ## Call at the end of a plot-creation chunk to stash dimensions for the next chunk. queue_plot <- function(plot) { dims <- attr(plot, "rec_dims") if (!is.null(dims)) { .plot_dims$width <- dims$width .plot_dims$height <- dims$height } invisible(plot) }
Standard regression analysis assumes independent observations and constant effects across subgroups. In practice, these assumptions often fail: observations may be clustered (e.g., subjects within study sites), effects may vary by subgroup (interaction), or the baseline hazard may differ across strata. This vignette demonstrates advanced analytical techniques to address these complexities.
The examples in this vignette use the clintrial dataset included with summata:
library(summata) library(survival) library(ggplot2) data(clintrial) data(clintrial_labels) # Examine the clustering structure table(clintrial$site)
The clintrial dataset includes 10 study sites, providing a natural clustering variable for hierarchical analysis.
n.b.: To ensure correct font rendering and figure sizing, the forest plots below are displayed using a helper function (
queue_plot()) that applies each plot's recommended dimensions (stored in the"rec_dims"attribute) via theragggraphics device. In practice, replacequeue_plot()withggplot2::ggsave()using recommended plot dimensions for equivalent results:
r p <- glmforest(model, data = mydata) dims <- attr(p, "rec_dims") ggplot2::ggsave("forest_plot.png", p, width = dims$width, height = dims$height)This ensures that the figure size is always large enough to accommodate the constituent plot text and graphics, and it is generally the preferred method for saving forest plot outputs in
summata.
Interaction analysis tests whether the association between a predictor and outcome varies across levels of another variable. Interactions are specified using colon notation in the interactions parameter; main effects are automatically included alongside the interaction term.
Specify interactions using colon notation in the interactions parameter:
example1 <- fit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "treatment", "stage"), interactions = c("sex:treatment"), model_type = "glm", labels = clintrial_labels ) example1
Test multiple effect modifiers simultaneously:
example2 <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), interactions = c("sex:treatment", "stage:treatment"), model_type = "coxph", labels = clintrial_labels ) example2
Test whether effects of a continuous variable vary by group:
example3 <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "treatment", "stage", "surgery"), interactions = c("age:treatment"), model_type = "lm", labels = clintrial_labels ) example3
Include interaction terms in combined univariable/multivariable analysis:
example4 <- fullfit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "treatment", "stage", "sex:treatment"), model_type = "glm", method = "all", labels = clintrial_labels ) example4
Use compfit() to formally evaluate whether interactions improve model fit (see Model Comparison):
example5 <- compfit( data = clintrial, outcome = "surgery", model_list = list( "Main Effects" = c("age", "sex", "treatment", "stage"), "Sex × Treatment" = c("age", "sex", "treatment", "stage"), "Stage × Treatment" = c("age", "sex", "treatment", "stage"), "Both Interactions" = c("age", "sex", "treatment", "stage") ), interactions_list = list( NULL, c("sex:treatment"), c("stage:treatment"), c("sex:treatment", "stage:treatment") ), model_type = "glm", labels = clintrial_labels ) example5
Visualize models including interaction terms:
interaction_model <- fit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "treatment", "stage"), interactions = c("sex:treatment"), model_type = "glm", labels = clintrial_labels ) example6 <- glmforest( x = attr(interaction_model, "model"), title = "Logistic Regression with Interaction", labels = clintrial_labels, indent_groups = TRUE, zebra_stripes = TRUE ) queue_plot(example6)
print(example6)
Mixed-effects models (multilevel or hierarchical models) account for correlation within clusters by incorporating random effects. The summata package supports linear mixed models (lmer), generalized linear mixed models (glmer), and Cox mixed models (coxme).
Random effects are specified using pipe notation within either the predictors parameter or in a separate random parameter:
| Syntax | Meaning |
|:-------|:--------|
| (1\|group) | Random intercepts by group |
| (x\|group) | Random intercepts and slopes for x |
| (1\|g1) + (1\|g2) | Crossed random effects |
The (1|site) syntax specifies a random intercept for each site:
example7 <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "treatment", "stage", "(1|site)"), model_type = "lmer", labels = clintrial_labels ) example7
For binary outcomes with clustering (note the use of an independent random parameter):
example8 <- fit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "treatment", "stage"), random = "(1|site)", model_type = "glmer", labels = clintrial_labels ) example8
Allow effects to vary by cluster:
example9 <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "treatment", "stage", "(1 + treatment|site)"), model_type = "lmer", labels = clintrial_labels ) example9
For survival outcomes with clustering:
example10 <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage", "(1|site)"), model_type = "coxme", labels = clintrial_labels ) example10
Visualize fixed effects from mixed-effects models:
example11 <- glmforest( x = attr(example8, "model"), title = "Logistic Mixed Model (Fixed Effects)", labels = clintrial_labels, indent_groups = TRUE, zebra_stripes = TRUE ) queue_plot(example11)
print(example11)
Use compfit() to compare different random-effects structures (see Model Comparison):
example12 <- compfit( data = clintrial, outcome = "los_days", model_list = list( "Random Intercepts" = c("age", "sex", "treatment", "stage", "(1|site)"), "Random Slopes" = c("age", "sex", "treatment", "stage", "(1 + treatment|site)") ), model_type = "lmer", labels = clintrial_labels ) example12
In addition to mixed-effects models, Cox models support stratification (separate baseline hazards) and clustered standard errors (robust variance estimation).
| Approach | When to Use | |:---------|:------------| | Mixed-effects | Model cluster-specific effects; prediction at cluster level | | Stratification | Proportional hazards assumption violated for a variable | | Clustered SE | Correlation within clusters; robust inference |
Stratification allows nonproportional hazards across strata without estimating stratum effects. Use the strata parameter:
example13 <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment"), strata = "site", model_type = "coxph", labels = clintrial_labels ) example13
For robust inference with correlated observations, cluster-robust standard errors account for within-cluster correlation. Use the cluster parameter:
example14 <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), cluster = "site", model_type = "coxph", labels = clintrial_labels ) example14
For survey data or inverse probability weighting, use the weights parameter. Weights can account for sampling design, non-response, or confounding adjustment.
# Create example weights clintrial$analysis_weight <- runif(nrow(clintrial), 0.5, 2.0) example15 <- fit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "treatment", "stage"), weights = "analysis_weight", model_type = "glm", labels = clintrial_labels ) example15
The uniscreen() function supports advanced model specifications including random effects and stratification.
Screen multiple predictors while accounting for clustering:
example16 <- uniscreen( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "treatment", "stage"), model_type = "glmer", random = "(1|site)", labels = clintrial_labels ) example16
For survival outcomes with clustering:
example17 <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage", "ecog"), model_type = "coxme", random = "(1|site)", labels = clintrial_labels ) example17
The uniforest() function visualizes univariable screening results:
uni_results <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage", "ecog", "grade"), model_type = "coxph", labels = clintrial_labels ) example18 <- uniforest( uni_results, title = "Univariable Screening Results", indent_groups = TRUE, zebra_stripes = TRUE ) queue_plot(example18)
print(example18)
The multifit() function supports interactions and mixed-effects models when testing a predictor across multiple outcomes.
Test effect modification across multiple outcomes:
example19 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), interactions = c("treatment:sex"), labels = clintrial_labels, parallel = FALSE ) example19
Account for clustering when testing effects across outcomes:
example20 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status"), predictor = "treatment", covariates = c("age", "sex"), random = "(1|site)", model_type = "glmer", labels = clintrial_labels, parallel = FALSE ) example20
For multiple survival outcomes with clustering:
example21 <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex"), random = "(1|site)", model_type = "coxme", labels = clintrial_labels, parallel = FALSE ) example21
The following demonstrates a complete advanced analysis workflow:
# Step 1: Screen risk factors for primary outcome risk_screening <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "diabetes", "hypertension", "stage", "ecog", "treatment"), model_type = "glm", p_threshold = 0.20, labels = clintrial_labels ) risk_screening # Step 2: Test key exposure across multiple outcomes effects <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), columns = "both", labels = clintrial_labels, parallel = FALSE ) effects # Step 3: Visualize effects forest_plot <- multiforest( effects, title = "Effects Across Outcomes", indent_predictor = TRUE, zebra_stripes = TRUE ) queue_plot(forest_plot)
print(forest_plot)
compfit()| Scenario | Recommended Approach | |:---------|:---------------------| | Clustered observations | Mixed-effects models | | Robust inference needed | Clustered standard errors | | PH assumption violated | Stratification | | Effect modification | Interaction terms | | Survey data | Weighted regression |
Simplify the random effects structure if convergence fails:
# Start with random intercepts only fit(data, outcome, c(predictors, "(1|site)"), model_type = "lmer")
Check for common problems such as few observations per cluster, near-zero variance in random effects, or highly correlated predictors.
For categorical × categorical interactions, the coefficient represents the additional effect beyond the sum of main effects:
# Access model for detailed interpretation model <- attr(result, "model") summary(model)
Forest plots display all terms including interactions. For complex interaction patterns, consider stratified analyses or effect plots.
options(old_opts)
desctable() for baseline characteristicsfit(), uniscreen(), and fullfit()compfit() for comparing modelsmultifit() for multi-outcome analysisAny scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.