View source: R/calc_risk_diff.R
calc_risk_diff | R Documentation |
Calculates risk differences (or prevalence differences for cross-sectional data) using generalized linear models with identity, log, or logit links. Version 0.2.1 includes enhanced boundary detection, robust confidence intervals, and improved data quality validation to prevent extreme confidence intervals in stratified analyses.
The function addresses common convergence issues with identity link binomial GLMs by implementing a fallback strategy across multiple link functions, similar to approaches described in Donoghoe & Marschner (2018) for relative risk regression.
calc_risk_diff(
data,
outcome,
exposure,
adjust_vars = NULL,
strata = NULL,
link = "auto",
alpha = 0.05,
boundary_method = "auto",
verbose = FALSE
)
data |
A data frame containing all necessary variables |
outcome |
Character string naming the binary outcome variable (must be 0/1 or logical) |
exposure |
Character string naming the exposure variable of interest |
adjust_vars |
Character vector of variables to adjust for (default: NULL) |
strata |
Character vector of stratification variables (default: NULL) |
link |
Character string specifying link function: "auto", "identity", "log", or "logit" (default: "auto") |
alpha |
Significance level for confidence intervals (default: 0.05) |
boundary_method |
Method for handling boundary cases: "auto", "profile", "bootstrap", "wald" (default: "auto") |
verbose |
Logical indicating whether to print diagnostic messages (default: FALSE) |
This version adds comprehensive data quality validation to prevent the extreme confidence intervals that could occur in stratified analyses:
Pre-analysis checks for stratification feasibility
Detection of small sample sizes within strata
Identification of rare outcomes or unbalanced exposures
Warning for potential separation issues
When the MLE is on the boundary, standard asymptotic theory may not apply. The function detects and handles:
upper_bound: Fitted probabilities approaching 1
lower_bound: Fitted probabilities approaching 0
separation: Complete or quasi-perfect separation
both_bounds: Mixed boundary issues
For boundary cases, implements:
Profile likelihood intervals (preferred when feasible)
Bootstrap confidence intervals (robust for complex cases)
Modified Wald intervals with boundary adjustments
Risk differences represent absolute changes in probability. A risk difference of 0.05 means the exposed group has a 5 percentage point higher risk than the unexposed group. This is often more interpretable than relative measures (risk ratios, odds ratios) for public health decision-making.
A tibble of class "riskdiff_result" containing the following columns:
Character. Name of exposure variable analyzed
Numeric. Risk difference estimate (proportion scale, e.g. 0.05 = 5 percentage points)
Numeric. Lower bound of confidence interval
Numeric. Upper bound of confidence interval
Numeric. P-value for test of null hypothesis (risk difference = 0)
Character. Link function successfully used ("identity", "log", "logit", or error type)
Integer. Number of observations used in analysis
Logical. TRUE if MLE is on parameter space boundary
Character. Type of boundary: "none", "upper_bound", "lower_bound", "separation", "both_bounds"
Character. Warning message for boundary cases (if any)
Character. Method used for confidence intervals ("wald", "profile", "bootstrap")
Additional columns for stratification variables if specified
The returned object has attributes including the original function call and alpha level used. Risk differences are on the probability scale where 0.05 represents a 5 percentage point difference.
Donoghoe MW, Marschner IC (2018). "logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model." Journal of Statistical Software, 86(9), 1-22. doi:10.18637/jss.v086.i09
Marschner IC, Gillett AC (2012). "Relative Risk Regression: Reliable and Flexible Methods for Log-Binomial Models." Biostatistics, 13(1), 179-192.
Venzon DJ, Moolgavkar SH (1988). "A Method for Computing Profile-Likelihood-Based Confidence Intervals." Journal of the Royal Statistical Society, 37(1), 87-94.
Rothman KJ, Greenland S, Lash TL (2008). Modern Epidemiology, 3rd edition. Lippincott Williams & Wilkins.
# Simple risk difference
data(cachar_sample)
rd_simple <- calc_risk_diff(
data = cachar_sample,
outcome = "abnormal_screen",
exposure = "areca_nut"
)
print(rd_simple)
# Age-adjusted risk difference
rd_adjusted <- calc_risk_diff(
data = cachar_sample,
outcome = "abnormal_screen",
exposure = "areca_nut",
adjust_vars = "age"
)
print(rd_adjusted)
# Stratified analysis with enhanced error checking and boundary detection
rd_stratified <- calc_risk_diff(
data = cachar_sample,
outcome = "abnormal_screen",
exposure = "areca_nut",
strata = "residence",
verbose = TRUE # See diagnostic messages and boundary detection
)
print(rd_stratified)
# Check for boundary cases
if (any(rd_stratified$on_boundary)) {
cat("Boundary cases detected!\n")
boundary_rows <- which(rd_stratified$on_boundary)
for (i in boundary_rows) {
cat("Row", i, ":", rd_stratified$boundary_type[i], "\n")
}
}
# Force profile likelihood CIs for enhanced robustness
rd_profile <- calc_risk_diff(
data = cachar_sample,
outcome = "abnormal_screen",
exposure = "areca_nut",
boundary_method = "profile"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.