| uniscreen | R Documentation |
Performs comprehensive univariable (unadjusted) regression analyses by fitting separate models for each predictor against a single outcome. This function is designed for initial variable screening, hypothesis generation, and understanding crude associations before multivariable modeling. Returns publication-ready formatted results with optional p-value filtering.
uniscreen(
data,
outcome,
predictors,
model_type = "glm",
family = "binomial",
random = NULL,
p_threshold = 0.05,
conf_level = 0.95,
reference_rows = TRUE,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
keep_models = FALSE,
exponentiate = NULL,
conf_method = NULL,
parallel = TRUE,
n_cores = NULL,
number_format = NULL,
verbose = NULL,
...
)
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcome |
Character string specifying the outcome variable name. For
survival analysis, use |
predictors |
Character vector of predictor variable names to screen. Each predictor is tested independently in its own univariable model. Can include continuous, categorical (factor), or binary variables. |
model_type |
Character string specifying the type of regression model to fit. Options include:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
p_threshold |
Numeric value between 0 and 1 specifying the p-value threshold used to count significant predictors in the printed summary. All predictors are always included in the output table. Default is 0.05. |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match predictor names, values are the
display labels. Predictors not in |
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients (display
OR/HR/RR instead of log odds/log hazards). Default is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting.
Set globally with |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for parallel
processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting functions
( |
Analysis Approach:
The function implements a comprehensive univariable screening workflow:
For each predictor in predictors, fits a separate model:
outcome ~ predictor
Extracts coefficients, confidence intervals, and p-values from each model
Combines results into a single table for easy comparison
Formats output for publication with appropriate effect measures
Each predictor is tested independently - these are crude (unadjusted) associations that do not account for confounding or interaction effects.
When to Use Univariable Screening:
Initial variable selection: Identify predictors associated with the outcome before building multivariable models
Hypothesis generation: Explore potential associations in exploratory analyses
Understanding crude associations: Report unadjusted effects alongside adjusted estimates
Variable reduction: Use p-value thresholds (e.g., p < 0.20) to identify candidates for multivariable modeling
Checking multicollinearity: Compare univariable and multivariable effects to identify potential collinearity
Threshold for p-values:
The p_threshold parameter controls the significance threshold used
in the printed summary to count how many predictors are significant. All
predictors are always included in the output table regardless of this setting.
Effect Measures by Model Type:
Logistic regression (model_type = "glm",
family = "binomial"): Odds ratios (OR)
Cox regression (model_type = "coxph"): Hazard ratios (HR)
Poisson regression (model_type = "glm",
family = "poisson"): Rate/risk ratios (RR)
Negative binomial (model_type = "negbin"): Rate ratios (RR)
Linear regression (model_type = "lm" or GLM with
identity link): Raw coefficient estimates
Gamma regression (model_type = "glm",
family = "Gamma"): Multiplicative effects (with default log link)
Memory Considerations:
When keep_models = FALSE (default), fitted models are discarded after
extracting results to conserve memory. Set keep_models = TRUE only when
the following are needed:
Model diagnostic plots
Predictions from individual models
Additional model statistics not extracted by default
Further analysis of specific models
A data.table with S3 class "uniscreen_result" containing formatted
univariable screening results. The table structure includes:
Character. Predictor name or custom label (from labels)
Character. For factor variables: category level. For continuous variables: typically empty or descriptive statistic label
Integer. Sample size used in the model (if show_n = TRUE)
Integer. Sample size for this specific factor level (factor variables only)
Integer. Total number of events in the model for survival
or logistic regression (if show_events = TRUE)
Integer. Number of events for this specific factor level (factor variables only)
Character. Formatted effect estimate with confidence interval. Column name depends on model type: "OR (95% CI)" for logistic, "HR (95% CI)" for survival, "RR (95% CI)" for counts, "Coefficient (95% CI)" for linear models
Character. Formatted p-value from the Wald test
The returned object includes the following attributes accessible via attr():
data.table. Unformatted numeric results with separate columns for coefficients, standard errors, confidence interval bounds, etc. Suitable for further statistical analysis or custom formatting
List (if keep_models = TRUE). Named list of fitted
model objects, with predictor names as list names. Access specific models
via attr(result, "models")[["predictor_name"]]
Character. The outcome variable name used
Character. The regression model type used
Character. Always "Univariable" for screening results
Character. Always "univariable" to identify the analysis type
Numeric. The p-value threshold used for significance
Character vector. Names of predictors with p-value below the screening threshold, suitable for passing directly to downstream modeling functions
fit for fitting a single multivariable model,
fullfit for complete univariable-to-multivariable workflow,
compfit for comparing multiple models,
m2dt for converting individual models to tables
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result()
# Load example data
data(clintrial)
data(clintrial_labels)
# Example 1: Basic logistic regression screening
screen1 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension"),
model_type = "glm",
family = "binomial",
parallel = FALSE
)
print(screen1)
# Example 2: With custom variable labels
screen2 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "treatment"),
labels = clintrial_labels,
parallel = FALSE
)
print(screen2)
# Example 3: Filter by p-value threshold
# Only keep predictors with p < 0.20 (common for screening)
screen3 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension",
"diabetes", "stage"),
p_threshold = 0.20,
labels = clintrial_labels,
parallel = FALSE
)
print(screen3)
# Only significant predictors are shown
# Example 4: Cox proportional hazards screening
library(survival)
cox_screen <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage", "grade"),
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_screen)
# Returns hazard ratios (HR) instead of odds ratios
# Example 5: Keep models for diagnostics
screen5 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi", "creatinine"),
keep_models = TRUE,
parallel = FALSE
)
# Access stored models
models <- attr(screen5, "models")
summary(models[["age"]])
plot(models[["age"]]) # Diagnostic plots
# Example 6: Linear regression screening
linear_screen <- uniscreen(
data = clintrial,
outcome = "bmi",
predictors = c("age", "sex", "smoking", "creatinine", "hemoglobin"),
model_type = "lm",
labels = clintrial_labels,
parallel = FALSE
)
print(linear_screen)
# Example 7: Poisson regression for equidispersed count outcomes
# fu_count has variance ~= mean, appropriate for standard Poisson
poisson_screen <- uniscreen(
data = clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment", "surgery"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels,
parallel = FALSE
)
print(poisson_screen)
# Returns rate ratios (RR)
# Example 8: Negative binomial for overdispersed counts
# ae_count has variance > mean (overdispersed), use negbin
if (requireNamespace("MASS", quietly = TRUE)) {
nb_screen <- uniscreen(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery"),
model_type = "negbin",
labels = clintrial_labels,
parallel = FALSE
)
print(nb_screen)
}
# Example 9: Gamma regression for positive continuous outcomes (\emph{e.g.,} costs)
gamma_screen <- uniscreen(
data = clintrial,
outcome = "los_days",
predictors = c("age", "sex", "treatment", "surgery"),
model_type = "glm",
family = Gamma(link = "log"),
labels = clintrial_labels,
parallel = FALSE
)
print(gamma_screen)
# Example 10: Hide reference rows for factor variables
screen10 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("treatment", "stage", "grade"),
reference_rows = FALSE,
parallel = FALSE
)
print(screen10)
# Reference categories not shown
# Example 11: Customize decimal places
screen11 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi", "creatinine"),
digits = 3, # 3 decimal places for OR
p_digits = 4 # 4 decimal places for p-values
)
print(screen11)
# Example 12: Hide sample size and event columns
screen12 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi"),
show_n = FALSE,
show_events = FALSE,
parallel = FALSE
)
print(screen12)
# Example 13: Access raw numeric data
screen13 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment"),
parallel = FALSE
)
raw_data <- attr(screen13, "raw_data")
print(raw_data)
# Contains unformatted coefficients, SEs, CIs, etc.
# Example 14: Force coefficient display instead of OR
screen14 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi"),
model_type = "glm",
family = "binomial",
parallel = FALSE,
exponentiate = FALSE # Show log odds instead of OR
)
print(screen14)
# Example 15: Screening with weights
screen15 <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "bmi"),
model_type = "coxph",
weights = runif(nrow(clintrial), min = 0.5, max = 2), # Random numbers for example
parallel = FALSE
)
# Example 16: Strict significance filter (p < 0.05)
sig_only <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension",
"diabetes", "ecog", "treatment", "stage", "grade"),
p_threshold = 0.05,
labels = clintrial_labels,
parallel = FALSE
)
# Check how many predictors passed the filter
n_significant <- length(unique(sig_only$Variable[sig_only$Variable != ""]))
cat("Significant predictors:", n_significant, "\n")
# Example 17: Complete workflow - screen then use in multivariable
# Step 1: Screen with liberal threshold
candidates <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension",
"diabetes", "treatment", "stage", "grade"),
p_threshold = 0.20,
parallel = FALSE
)
# Step 2: Extract significant predictor names
sig_predictors <- attr(candidates, "significant")
# Step 3: Fit multivariable model with selected predictors
multi_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = sig_predictors,
labels = clintrial_labels
)
print(multi_model)
# Example 18: Mixed-effects logistic regression (glmer)
# Accounts for clustering by site
if (requireNamespace("lme4", quietly = TRUE)) {
glmer_screen <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "glmer",
random = "(1|site)",
family = "binomial",
labels = clintrial_labels,
parallel = FALSE
)
print(glmer_screen)
}
# Example 19: Mixed-effects linear regression (lmer)
if (requireNamespace("lme4", quietly = TRUE)) {
lmer_screen <- uniscreen(
data = clintrial,
outcome = "biomarker_x",
predictors = c("age", "sex", "treatment", "smoking"),
model_type = "lmer",
random = "(1|site)",
labels = clintrial_labels,
parallel = FALSE
)
print(lmer_screen)
}
# Example 20: Mixed-effects Cox model (coxme)
if (requireNamespace("coxme", quietly = TRUE)) {
coxme_screen <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "coxme",
random = "(1|site)",
labels = clintrial_labels,
parallel = FALSE
)
print(coxme_screen)
}
# Example 21: Mixed-effects with nested random effects
# Patients nested within sites
if (requireNamespace("lme4", quietly = TRUE)) {
nested_screen <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment"),
model_type = "glmer",
random = "(1|site/patient_id)",
family = "binomial",
parallel = FALSE
)
}
# Example 22: Quasipoisson for overdispersed count data
# Alternative to negative binomial when MASS not available
quasi_screen <- uniscreen(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery", "stage"),
model_type = "glm",
family = "quasipoisson",
labels = clintrial_labels,
parallel = FALSE
)
print(quasi_screen)
# Adjusts standard errors for overdispersion
# Example 23: Quasibinomial for overdispersed binary data
quasibin_screen <- uniscreen(
data = clintrial,
outcome = "any_complication",
predictors = c("age", "bmi", "diabetes", "surgery", "ecog"),
model_type = "glm",
family = "quasibinomial",
labels = clintrial_labels,
parallel = FALSE
)
print(quasibin_screen)
# Example 24: Inverse Gaussian for highly skewed positive data
invgauss_screen <- uniscreen(
data = clintrial,
outcome = "recovery_days",
predictors = c("age", "surgery", "pain_score", "los_days"),
model_type = "glm",
family = inverse.gaussian(link = "log"),
labels = clintrial_labels,
parallel = FALSE
)
print(invgauss_screen)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.