| multifit | R Documentation |
Performs regression analyses of a single predictor (exposure) across multiple outcomes. This function is designed for studies where a single exposure variable is tested against multiple endpoints, such as complication screening, biomarker associations, or phenome-wide association studies. Returns publication-ready formatted results with optional covariate adjustment. Supports interactions, mixed-effects models, stratification, and clustered standard errors.
multifit(
data,
outcomes,
predictor,
covariates = NULL,
interactions = NULL,
random = NULL,
strata = NULL,
cluster = NULL,
model_type = "glm",
family = "binomial",
columns = "adjusted",
p_threshold = 1,
conf_level = 0.95,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
predictor_label = NULL,
include_predictor = TRUE,
keep_models = FALSE,
exponentiate = NULL,
conf_method = NULL,
parallel = TRUE,
n_cores = NULL,
number_format = NULL,
verbose = NULL,
...
)
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcomes |
Character vector of outcome variable names to analyze. Each
outcome is tested in its own model with the predictor. For time-to-event
analysis, use |
predictor |
Character string specifying the predictor (exposure) variable name. This variable is tested against each outcome. Can be continuous or categorical (factor). |
covariates |
Optional character vector of covariate variable names to
include in adjusted models. When specified, models are fit as
|
interactions |
Optional character vector of interaction terms to include
in adjusted models, using colon notation (e.g., |
random |
Optional character string specifying random effects formula for
mixed effects models (e.g., |
strata |
Optional character string naming the stratification variable for
Cox or conditional logistic models. Creates separate baseline hazards for
each stratum. Default is |
cluster |
Optional character string naming the clustering variable for
Cox models. Computes robust clustered standard errors. Default is |
model_type |
Character string specifying the type of regression model to fit. Options include:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
columns |
Character string specifying which result columns to display when
both unadjusted and adjusted models are fit (i.e., when
Ignored when |
p_threshold |
Numeric value between 0 and 1 specifying a p-value threshold for filtering results. Only outcomes with p-value less than or equal to the threshold are included in the output. Default is 1 (no filtering; all outcomes returned). |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Can include labels for outcomes, predictors, and
covariates. Names should match variable names, values are the display labels.
Labels are applied to: (1) outcome names in the Outcome column, (2) predictor
variable name when displayed, and (3) variable names in formatted interaction
terms. Variables not in |
predictor_label |
Optional character string providing a custom display
label for the predictor variable. Takes precedence over |
include_predictor |
Logical. If |
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients (display
OR/HR/RR instead of log odds/log hazards). Default is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting.
Set globally with |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for
parallel processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting functions. |
Analysis Approach:
The function implements a multivariate (multi-outcome) screening workflow that inverts the typical regression paradigm:
For each outcome in outcomes, fits a separate model with the
predictor as the main exposure
If covariates specified, fits adjusted model:
outcome ~ predictor + covariates + interactions
Extracts only the predictor effect(s) from each model, ignoring covariate coefficients
Combines results into a single table for comparison across outcomes
Optionally filters by p-value threshold
This is conceptually opposite to uniscreen(), which tests multiple
predictors against a single outcome. Use multifit() when you have one
exposure of interest and want to screen across multiple endpoints.
When to Use Multivariate Regression Analysis:
Complication screening: Test one exposure (e.g., operative time, BMI, biomarker level) against multiple postoperative complications
Treatment effects: Test one treatment against multiple efficacy and safety endpoints simultaneously
Biomarker studies: Test one biomarker against multiple clinical outcomes to understand its prognostic value
Phenome-wide association studies (PheWAS): Test genetic variants or exposures against many phenotypes
Risk factor profiling: Understand how one risk factor relates to a spectrum of outcomes
Handling Categorical Predictors:
When the predictor is a factor variable with multiple levels:
Each non-reference level gets its own row for each outcome
Reference category is determined by factor level ordering
The Predictor column shows "Variable (Level)" format (e.g., "Treatment (Drug A)", "Treatment (Drug B)")
For binary variables with affirmative non-reference levels (Yes, 1, True, Present, Positive, +), shows just "Variable" (e.g., "Diabetes" instead of "Diabetes (Yes)")
Effect estimates compare each level to the reference
Adjusted vs. Unadjusted Results:
When covariates is specified, the function fits both models but only
extracts predictor effects:
columns = "adjusted": Reports only covariate-adjusted effects.
Column labeled "aOR/aHR," etc.
columns = "unadjusted": Reports only crude effects. Column
labeled "OR/HR," etc.
columns = "both": Reports both side-by-side. Useful for
identifying confounding (large change in effect) or independent effects
(similar estimates)
Interaction Terms:
When interactions includes terms involving the predictor:
Main effect of predictor is always reported
Interaction effects are extracted and displayed with formatted names
Format: Variable (Level) × Variable (Level) using multiplication sign notation
Useful for testing effect modification (e.g., does treatment effect differ by sex?)
Mixed-Effects Models:
For clustered or hierarchical data (e.g., patients within hospitals):
Use model_type = "glmer" with random = "(1|cluster)" for
random intercept models
Nested random effects: random = "(1|site/patient)"
Crossed random effects: random = "(1|site) + (1|doctor)"
For survival outcomes, use model_type = "coxme"
Stratification and Clustering (Cox models):
For Cox proportional hazards models:
strata: Creates separate baseline hazards for each stratum level.
Use when hazards are non-proportional across strata but stratum effects do
not need to be estimated
cluster: Computes robust (sandwich) standard errors accounting
for within-cluster correlation. Alternative to mixed effects when only
robust SEs are needed
Filtering based on p-value:
The p_threshold parameter filters results after fitting all models:
Only outcomes with p less than or equal to the threshold are retained in output
For factor predictors, outcome is kept if any level is significant
Useful for focusing on significant associations in exploratory analyses
Default is 1 (no filtering) - recommended for confirmatory analyses
Outcome Homogeneity:
All outcomes in a single multifit() call should be of the same type
(all binary, all continuous, or all survival). Mixing outcome types produces
tables with incompatible effect measures (e.g., odds ratios alongside regression
coefficients), which can mislead readers. The function validates outcome
compatibility and issues a warning when mixed types are detected.
For analyses involving multiple outcome types, run separate multifit()
calls for each type:
# Binary outcomes
binary_results <- multifit(data, outcomes = c("death", "readmission"),
predictor = "treatment", model_type = "glm")
# Continuous outcomes
continuous_results <- multifit(data, outcomes = c("los_days", "cost"),
predictor = "treatment", model_type = "lm")
Effect Measures by Model Type:
Logistic (model_type = "glm", family = "binomial"):
Odds ratios (OR/aOR)
Cox (model_type = "coxph"): Hazard ratios (HR/aHR)
Poisson (model_type = "glm", family = "poisson"):
Rate ratios (RR/aRR)
Linear (model_type = "lm"): Coefficient estimates
Mixed effects: Same as fixed-effects counterparts
Memory and Performance:
parallel = TRUE (default) uses multiple cores for faster fitting
keep_models = FALSE (default) discards model objects to save memory
For many outcomes, parallel processing provides substantial speedup
Set keep_models = TRUE only when you need model diagnostics
A data.table with S3 class "multifit_result" containing formatted
multivariate regression results. The table structure includes:
Character. Outcome variable name or custom label
Character. For factor predictors: formatted as "Variable (Level)" showing the level being compared to reference. For binary variables where the non-reference level is an affirmative value (Yes, 1, True, Present, Positive, +), shows just "Variable". For continuous predictors: the variable name. For interactions: the formatted interaction term (e.g., "Treatment (Drug A) × Sex (Male)")
Integer. Sample size used in the model (if show_n = TRUE)
Integer. Number of events (if show_events = TRUE)
Character. Unadjusted effect
estimate with CI (if columns = "unadjusted" or "both")
Character. Adjusted
effect estimate with CI (if columns = "adjusted" or "both")
Character. Formatted p-value(s). Column
names depend on columns setting
The returned object includes the following attributes accessible via attr():
data.table. Unformatted numeric results with separate columns for effect estimates, standard errors, confidence intervals, and p-values. Suitable for custom analysis or visualization
list (if keep_models = TRUE). Named list of fitted
model objects, with outcome names as list names. Each element contains
$unadjusted and/or $adjusted models depending on settings
Character. The predictor variable name
Character vector. The outcome variable names
Character vector or NULL. The covariate variable names
Character vector or NULL. The interaction terms
Character or NULL. The random effects formula
Character or NULL. The stratification variable
Character or NULL. The clustering variable
Character. The regression model type used
Character. Which columns were displayed
Character. "multi_outcome" to identify analysis type
Character vector. Names of outcomes with p < 0.05 for the predictor (uses adjusted p-values when available)
uniscreen for screening multiple predictors against one outcome,
multiforest for creating forest plots from multifit results,
fit for single-outcome regression with full coefficient output,
fullfit for complete univariable-to-multivariable workflow
Other regression functions:
compfit(),
fit(),
fullfit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
# Load example data
data(clintrial)
data(clintrial_labels)
# Example 1: Basic multivariate analysis (unadjusted)
# Test treatment effect on multiple binary outcomes
result1 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
labels = clintrial_labels,
parallel = FALSE
)
print(result1)
# Shows odds ratios comparing Drug A and Drug B to Control
# Example 2: Adjusted analysis with covariates
# Adjust for age, sex, and disease stage
result2 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
labels = clintrial_labels,
parallel = FALSE
)
print(result2)
# Shows adjusted odds ratios (aOR)
# Example 3: Compare unadjusted and adjusted results
result3 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
columns = "both",
labels = clintrial_labels,
parallel = FALSE
)
print(result3)
# Useful for identifying confounding effects
# Example 4: Continuous predictor across outcomes
# Test age effect on multiple outcomes
result4 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "age",
covariates = c("sex", "treatment", "stage"),
labels = clintrial_labels,
parallel = FALSE
)
print(result4)
# One row per outcome for continuous predictor
# Example 5: Cox regression for survival outcomes
library(survival)
cox_result <- multifit(
data = clintrial,
outcomes = c("Surv(pfs_months, pfs_status)",
"Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_result)
# Returns hazard ratios (HR/aHR)
# Example 6: Cox with stratification by site
cox_strat <- multifit(
data = clintrial,
outcomes = c("Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex"),
strata = "site",
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_strat)
# Example 7: Cox with clustered standard errors
cox_cluster <- multifit(
data = clintrial,
outcomes = c("Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
cluster = "site",
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_cluster)
# Example 8: Interaction between predictor and covariate
# Test if treatment effect differs by sex
result_int <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
interactions = c("treatment:sex"),
labels = clintrial_labels,
parallel = FALSE
)
print(result_int)
# Shows main effects and interaction terms with × notation
# Example 9: Linear model for continuous outcomes
linear_result <- multifit(
data = clintrial,
outcomes = c("los_days", "biomarker_x"),
predictor = "treatment",
covariates = c("age", "sex"),
model_type = "lm",
labels = clintrial_labels,
parallel = FALSE
)
print(linear_result)
# Returns coefficient estimates, not ratios
# Example 10: Poisson regression for equidispersed count outcomes
# fu_count has variance ~= mean, appropriate for standard Poisson
poisson_result <- multifit(
data = clintrial,
outcomes = c("fu_count"),
predictor = "treatment",
covariates = c("age", "stage"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels,
parallel = FALSE
)
print(poisson_result)
# Returns rate ratios (RR)
# For overdispersed counts (ae_count), use model_type = "negbin" instead
# Example 11: Filter to significant results only
sig_results <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "stage",
p_threshold = 0.05,
labels = clintrial_labels,
parallel = FALSE
)
print(sig_results)
# Only outcomes with significant associations shown
# Example 12: Custom outcome labels
result_labeled <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
labels = c(
surgery = "Surgical Resection",
pfs_status = "Disease Progression",
os_status = "Death",
treatment = "Treatment Group"
),
parallel = FALSE
)
print(result_labeled)
# Example 13: Keep models for diagnostics
result_models <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "treatment",
covariates = c("age", "sex"),
keep_models = TRUE,
parallel = FALSE
)
# Access stored models
models <- attr(result_models, "models")
names(models)
# Get adjusted model for surgery outcome
surgery_model <- models$surgery$adjusted
summary(surgery_model)
# Example 14: Access raw numeric data
result <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "age",
parallel = FALSE
)
# Get unformatted results for custom analysis
raw_data <- attr(result, "raw_data")
print(raw_data)
# Contains exp_coef, ci_lower, ci_upper, p_value, \emph{etc.}
# Example 15: Hide sample size and event columns
result_minimal <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "treatment",
show_n = FALSE,
show_events = FALSE,
parallel = FALSE
)
print(result_minimal)
# Example 16: Customize decimal places
result_digits <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "age",
digits = 3,
p_digits = 4,
parallel = FALSE
)
print(result_digits)
# Example 17: Force coefficient display (no exponentiation)
result_coef <- multifit(
data = clintrial,
outcomes = c("surgery"),
predictor = "age",
exponentiate = FALSE,
parallel = FALSE
)
print(result_coef)
# Example 18: Complete publication workflow
final_table <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage", "grade"),
columns = "both",
labels = clintrial_labels,
digits = 2,
p_digits = 3,
parallel = FALSE
)
print(final_table)
# Example 19: Gamma regression for positive continuous outcomes
gamma_result <- multifit(
data = clintrial,
outcomes = c("los_days", "recovery_days"),
predictor = "treatment",
covariates = c("age", "surgery"),
model_type = "glm",
family = Gamma(link = "log"),
labels = clintrial_labels,
parallel = FALSE
)
print(gamma_result)
# Returns multiplicative effects on positive continuous data
# Example 20: Quasipoisson for overdispersed counts
quasi_result <- multifit(
data = clintrial,
outcomes = c("ae_count"),
predictor = "treatment",
covariates = c("age", "diabetes"),
model_type = "glm",
family = "quasipoisson",
labels = clintrial_labels,
parallel = FALSE
)
print(quasi_result)
# Adjusts standard errors for overdispersion
# Example 21: Generalized linear mixed effects (GLMER)
# Test treatment across outcomes with site clustering
if (requireNamespace("lme4", quietly = TRUE)) {
glmer_result <- suppressWarnings(multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex"),
random = "(1|site)",
model_type = "glmer",
family = "binomial",
labels = clintrial_labels,
parallel = FALSE
))
print(glmer_result)
}
# Example 22: Cox mixed effects with random site effects
if (requireNamespace("coxme", quietly = TRUE)) {
coxme_result <- multifit(
data = clintrial,
outcomes = c("Surv(pfs_months, pfs_status)",
"Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
random = "(1|site)",
model_type = "coxme",
labels = clintrial_labels,
parallel = FALSE
)
print(coxme_result)
}
# Example 23: Multiple interactions across outcomes
multi_int <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
interactions = c("treatment:stage", "treatment:sex"),
labels = clintrial_labels,
parallel = FALSE
)
print(multi_int)
# Shows how treatment effects vary by stage and sex across outcomes
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.