| fit | R Documentation |
Provides a unified interface for fitting various types of regression models with automatic formatting of results for publication. Supports generalized linear models, linear models, survival models, and mixed-effects models with consistent syntax and output formatting. Handles both univariable and multivariable models automatically.
fit(
data = NULL,
outcome = NULL,
predictors = NULL,
model = NULL,
model_type = "glm",
family = "binomial",
random = NULL,
interactions = NULL,
strata = NULL,
cluster = NULL,
weights = NULL,
conf_level = 0.95,
reference_rows = TRUE,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
keep_qc_stats = TRUE,
exponentiate = NULL,
conf_method = NULL,
number_format = NULL,
verbose = NULL,
...
)
data |
Data frame or data.table containing the analysis dataset. Required for formula-based workflow; optional for model-based workflow (extracted from model if not provided). |
outcome |
Character string specifying the outcome variable name. For
survival analysis, use |
predictors |
Character vector of predictor variable names to include in
the model. All predictors are included simultaneously (multivariable model).
For univariable models, provide a single predictor. Can include continuous,
categorical (factor), or binary variables. Required for formula-based
workflow; ignored if |
model |
Optional pre-fitted model object to format. When provided,
|
model_type |
Character string specifying the type of regression model.
Ignored if
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
interactions |
Character vector of interaction terms using colon
notation (e.g., |
strata |
For Cox or conditional logistic models, character string naming
the stratification variable. Creates separate baseline hazards for each
stratum level without estimating stratum effects. Default is |
cluster |
For Cox models, character string naming the variable for
robust clustered standard errors. Accounts for within-cluster correlation
(e.g., patients within hospitals). Default is |
weights |
Character string naming the weights variable in |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names, values are display
labels. Default is |
keep_qc_stats |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients. Default
is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting. |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting
function ( |
Model Scope Detection:
The function automatically detects whether the model is:
Univariable: Single predictor (e.g., predictors = "age").
Effect estimates are labeled as unadjusted ("OR", "HR", etc.), representing
crude (unadjusted) association
Multivariable: Multiple predictors (e.g.,
predictors = c("age", "sex", "treatment"))
Effect estimates are labeled as adjusted ("aOR", "aHR", etc.), representing
associations adjusted for confounding
Interaction Terms:
Interactions are specified using colon notation and added to the model:
interactions = c("age:treatment") creates interaction
between age and treatment
Main effects for both variables are automatically included
Multiple interactions can be specified:
c("age:sex", "treatment:stage")
For interactions between categorical variables, separate terms are created for each combination of levels
Stratification (Cox/Conditional Logistic):
The strata parameter creates separate baseline hazards:
Allows baseline hazard to vary across strata without estimating stratum effects
Useful when proportional hazards assumption violated across strata
Example: strata = "center" for multicenter studies
Stratification variable is not included as a predictor
Clustering (Cox Models):
The cluster parameter computes robust standard errors:
Accounts for within-cluster correlation (e.g., multiple observations per patient)
Uses sandwich variance estimator
Does not change point estimates, only standard errors and p-values
Weighting:
The weights parameter enables weighted regression:
For survey data with sampling weights
Inverse probability weighting for causal inference
Frequency weights for aggregated data
Weights should be in a column of data
Mixed-Effects Models (lmer/glmer/coxme):
Mixed effects models handle hierarchical or clustered data:
Use model_type = "lmer" for continuous/normal outcomes
Use model_type = "glmer" with appropriate family for GLM outcomes
Use model_type = "coxme" for survival outcomes with clustering
Random effects are specified in predictors using lme4 syntax:
"(1|site)" - Random intercepts by site
"(treatment|site)" - Random slopes for treatment by site
"(1 + treatment|site)" - Both random intercepts and slopes
Include random effects as part of the predictors vector
Example: predictors = c("age", "treatment", "(1|site)")
Effect Measures by Model Type:
Logistic (family = "binomial"/"quasibinomial"): Odds ratios (OR/aOR)
Cox (model_type = "coxph"): Hazard ratios (HR/aHR)
Poisson/Count (family = "poisson"/"quasipoisson"): Rate ratios (RR/aRR)
Negative binomial (model_type = "negbin"): Rate ratios (RR/aRR)
Gamma/Log-link: Ratios (multiplicative effects)
Linear/Gaussian: Raw coefficient estimates (additive effects)
Confidence Intervals:
Confidence interval computation is tailored to each model class using the best available method:
GLM and negative binomial: Profile likelihood intervals via
MASS::confint.glm(), which invert the profile deviance and account
for asymmetry in the likelihood surface. More accurate than the Wald
approximation when subgroup sizes are small or estimates are near boundary
values. Quasi-likelihood families (quasibinomial, quasipoisson)
fall back to Wald intervals because they lack a true likelihood function.
Linear models: Exact t-distribution intervals via
confint.lm(), based on the known sampling distribution under
normality.
Cox proportional hazards: Wald intervals (i.e.,
coefficient \pm z \times SE), the standard approach in
the survival analysis literature.
Mixed-effects models (lmer, glmer, coxme): Wald intervals.
Profile likelihood is available for lme4 models via
confint(model, method = "profile") but can be prohibitively slow
for complex random-effects structures and is not used by default.
If profile likelihood computation fails for any reason (e.g., non-convergence during profiling), the function falls back silently to Wald intervals.
A data.table with S3 class "fit_result" containing formatted
regression results. The table structure includes:
Character. Predictor name or custom label
Character. For factor variables: category level. For interactions: interaction term. For continuous: typically empty
Integer. Total sample size (if show_n = TRUE)
Integer. Sample size for this factor level
Integer. Total number of events (if show_events = TRUE)
Integer. Events for this factor level
Character. Formatted effect estimate with confidence interval. Column name depends on model type and scope. Univariable models use: OR, HR, RR, Coefficient. Multivariable models use adjusted notation: aOR, aHR, aRR, Adj. Coefficient
Character. Formatted p-value from Wald test
The returned object includes the following attributes accessible via attr():
The fitted model object (glm, lm, coxph, etc.). Access for diagnostics, predictions, or further analysis
data.table. Unformatted numeric results with columns for coefficients, standard errors, confidence bounds, quality statistics, etc.
Character. The outcome variable name
Character vector. The predictor variable names
Character. The complete model formula as a string
Character. "Univariable" (one predictor) or "Multivariable" (multiple predictors)
Character. The regression model type used
Character vector (if interactions specified). The interaction terms included
Character (if stratification used). The stratification variable
Character (if clustering used). The cluster variable
Character (if weighting used). The weights variable
Character vector. Names of predictors with p-value below 0.05, suitable for downstream variable selection workflows
uniscreen for univariable screening of multiple predictors,
fullfit for complete univariable-to-multivariable workflow,
compfit for comparing multiple models,
m2dt for model-to-table conversion
Other regression functions:
compfit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
# Load example data
data(clintrial)
data(clintrial_labels)
library(survival)
# Example 1: Univariable logistic regression
uni_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = "age"
)
print(uni_model)
# Labeled as "Univariable OR"
# Example 2: Multivariable logistic regression
multi_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "treatment"),
labels = clintrial_labels
)
print(multi_model)
# Example 3: Cox proportional hazards model
cox_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "coxph",
labels = clintrial_labels
)
print(cox_model)
# Example 4: Model with interaction terms
interact_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment", "sex"),
interactions = c("age:treatment"),
labels = clintrial_labels
)
print(interact_model)
# Example 5: Cox model with stratification
strat_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment"),
model_type = "coxph",
strata = "site", # Separate baseline hazards by site
labels = clintrial_labels
)
print(strat_model)
# Example 6: Cox model with clustering
cluster_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "treatment"),
model_type = "coxph",
cluster = "site", # Robust SEs accounting for site clustering
labels = clintrial_labels
)
print(cluster_model)
# Example 7: Linear regression
linear_model <- fit(
data = clintrial,
outcome = "bmi",
predictors = c("age", "sex", "smoking"),
model_type = "lm",
labels = clintrial_labels
)
print(linear_model)
# Example 8: Poisson regression for equidispersed count data
# fu_count has variance ~= mean, appropriate for standard Poisson
poisson_model <- fit(
data = clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment", "surgery"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels
)
print(poisson_model)
# Returns rate ratios (RR/aRR)
# Example 9: Negative binomial regression for overdispersed counts
# ae_count has variance > mean (overdispersed), use negbin or quasipoisson
if (requireNamespace("MASS", quietly = TRUE)) {
nb_result <- fit(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery"),
model_type = "negbin",
labels = clintrial_labels
)
print(nb_result)
}
# Example 10: Gamma regression for positive continuous outcomes
gamma_model <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "surgery"),
model_type = "glm",
family = Gamma(link = "log"),
labels = clintrial_labels
)
print(gamma_model)
# Example 11: Access the underlying fitted model
result <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi")
)
# Get the model object
model_obj <- attr(result, "model")
summary(model_obj)
# Model diagnostics
plot(model_obj)
# Predictions
preds <- predict(model_obj, type = "response")
# Example 12: Access raw numeric data
raw_data <- attr(result, "raw_data")
print(raw_data)
# Contains unformatted coefficients, SEs, CIs, AIC, BIC, etc.
# Example 13: Multiple interactions
complex_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "bmi"),
interactions = c("age:treatment", "sex:bmi"),
labels = clintrial_labels
)
print(complex_model)
# Example 14: Customize output columns
minimal <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment"),
show_n = FALSE,
show_events = FALSE,
reference_rows = FALSE
)
print(minimal)
# Example 15: Different confidence levels
ci90 <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment"),
conf_level = 0.90 # 90% confidence intervals
)
print(ci90)
# Example 16: Force coefficient display instead of OR
coef_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi"),
exponentiate = FALSE # Show log odds instead of OR
)
print(coef_model)
# Example 17: Confidence interval method
# Default: profile likelihood CIs for GLM (more accurate)
profile_result <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment"),
p_digits = 4,
conf_method = "profile"
)
print(profile_result)
# Wald CIs (faster, suitable for simulation or exploratory work)
wald_result <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment"),
p_digits = 4,
conf_method = "wald"
)
print(wald_result)
# Example 18: Check model quality statistics
result <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
keep_qc_stats = TRUE
)
raw <- attr(result, "raw_data")
cat("AIC:", raw$AIC[1], "\n")
cat("BIC:", raw$BIC[1], "\n")
cat("C-statistic:", raw$c_statistic[1], "\n")
# Example 19: Interaction effects - treatment effect modified by stage
interaction_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "treatment", "stage"),
interactions = c("treatment:stage"),
model_type = "coxph",
labels = clintrial_labels
)
print(interaction_model)
# Shows main effects plus all treatment×stage interaction terms
# Example 20: Multiple interactions in logistic regression
multi_interaction <- fit(
data = clintrial,
outcome = "readmission_30d",
predictors = c("age", "sex", "surgery", "diabetes"),
interactions = c("surgery:diabetes", "age:sex"),
labels = clintrial_labels
)
print(multi_interaction)
# Example 21: Quasipoisson for overdispersed count data
# Alternative to negative binomial when MASS not available
quasi_model <- fit(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery"),
model_type = "glm",
family = "quasipoisson",
labels = clintrial_labels
)
print(quasi_model)
# Adjusts standard errors for overdispersion
# Example 22: Quasibinomial for overdispersed binary data
quasi_logistic <- fit(
data = clintrial,
outcome = "any_complication",
predictors = c("age", "bmi", "diabetes", "surgery"),
model_type = "glm",
family = "quasibinomial",
labels = clintrial_labels
)
print(quasi_logistic)
# Example 23: Gamma regression with identity link for additive effects
gamma_identity <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "surgery", "any_complication"),
model_type = "glm",
family = Gamma(link = "identity"),
labels = clintrial_labels
)
print(gamma_identity)
# Shows additive effects (coefficients) instead of multiplicative (ratios)
# Example 24: Inverse Gaussian regression for highly skewed data
inverse_gaussian <- fit(
data = clintrial,
outcome = "recovery_days",
predictors = c("age", "surgery", "pain_score"),
model_type = "glm",
family = inverse.gaussian(link = "log"),
labels = clintrial_labels
)
print(inverse_gaussian)
# Example 25: Linear mixed effects with random intercepts
# Accounts for clustering of patients within sites
if (requireNamespace("lme4", quietly = TRUE)) {
lmer_model <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "stage", "(1|site)"),
model_type = "lmer",
labels = clintrial_labels
)
print(lmer_model)
}
# Example 26: Generalized linear mixed effects (logistic with random effects)
if (requireNamespace("lme4", quietly = TRUE)) {
glmer_model <- fit(
data = clintrial,
outcome = "readmission_30d",
predictors = c("age", "surgery", "los_days", "(1|site)"),
model_type = "glmer",
family = "binomial",
labels = clintrial_labels
)
print(glmer_model)
}
# Example 27: Cox mixed effects for clustered survival data
if (requireNamespace("coxme", quietly = TRUE)) {
coxme_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "treatment", "stage", "(1|site)"),
model_type = "coxme",
labels = clintrial_labels
)
print(coxme_model)
}
# Example 28: Random slopes - treatment effect varies by site
if (requireNamespace("lme4", quietly = TRUE)) {
random_slopes <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "stage", "(treatment|site)"),
model_type = "lmer",
labels = clintrial_labels
)
print(random_slopes)
}
# Example 29: Format a pre-fitted model (model-based workflow)
# Useful for models fitted outside of fit()
pre_fitted <- glm(os_status ~ age + sex + treatment,
family = binomial, data = clintrial)
result <- fit(model = pre_fitted,
data = clintrial,
labels = clintrial_labels)
print(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.