calc_risk_diff: Calculate Risk Differences with Robust Model Fitting and...

View source: R/calc_risk_diff.R

calc_risk_diffR Documentation

Calculate Risk Differences with Robust Model Fitting and Boundary Detection

Description

Calculates risk differences (or prevalence differences for cross-sectional data) using generalized linear models with identity, log, or logit links. Version 0.2.1 includes enhanced boundary detection, robust confidence intervals, and improved data quality validation to prevent extreme confidence intervals in stratified analyses.

The function addresses common convergence issues with identity link binomial GLMs by implementing a fallback strategy across multiple link functions, similar to approaches described in Donoghoe & Marschner (2018) for relative risk regression.

Usage

calc_risk_diff(
  data,
  outcome,
  exposure,
  adjust_vars = NULL,
  strata = NULL,
  link = "auto",
  alpha = 0.05,
  boundary_method = "auto",
  verbose = FALSE
)

Arguments

data

A data frame containing all necessary variables

outcome

Character string naming the binary outcome variable (must be 0/1 or logical)

exposure

Character string naming the exposure variable of interest

adjust_vars

Character vector of variables to adjust for (default: NULL)

strata

Character vector of stratification variables (default: NULL)

link

Character string specifying link function: "auto", "identity", "log", or "logit" (default: "auto")

alpha

Significance level for confidence intervals (default: 0.05)

boundary_method

Method for handling boundary cases: "auto", "profile", "bootstrap", "wald" (default: "auto")

verbose

Logical indicating whether to print diagnostic messages (default: FALSE)

Details

New in Version 0.2.1: Enhanced Stability and Quality Validation

This version adds comprehensive data quality validation to prevent the extreme confidence intervals that could occur in stratified analyses:

Enhanced Data Validation:
  • Pre-analysis checks for stratification feasibility

  • Detection of small sample sizes within strata

  • Identification of rare outcomes or unbalanced exposures

  • Warning for potential separation issues

Boundary Detection and Robust Inference:

When the MLE is on the boundary, standard asymptotic theory may not apply. The function detects and handles:

  • upper_bound: Fitted probabilities approaching 1

  • lower_bound: Fitted probabilities approaching 0

  • separation: Complete or quasi-perfect separation

  • both_bounds: Mixed boundary issues

Robust Confidence Intervals:

For boundary cases, implements:

  • Profile likelihood intervals (preferred when feasible)

  • Bootstrap confidence intervals (robust for complex cases)

  • Modified Wald intervals with boundary adjustments

Risk Difference Interpretation

Risk differences represent absolute changes in probability. A risk difference of 0.05 means the exposed group has a 5 percentage point higher risk than the unexposed group. This is often more interpretable than relative measures (risk ratios, odds ratios) for public health decision-making.

Value

A tibble of class "riskdiff_result" containing the following columns:

exposure_var

Character. Name of exposure variable analyzed

rd

Numeric. Risk difference estimate (proportion scale, e.g. 0.05 = 5 percentage points)

ci_lower

Numeric. Lower bound of confidence interval

ci_upper

Numeric. Upper bound of confidence interval

p_value

Numeric. P-value for test of null hypothesis (risk difference = 0)

model_type

Character. Link function successfully used ("identity", "log", "logit", or error type)

n_obs

Integer. Number of observations used in analysis

on_boundary

Logical. TRUE if MLE is on parameter space boundary

boundary_type

Character. Type of boundary: "none", "upper_bound", "lower_bound", "separation", "both_bounds"

boundary_warning

Character. Warning message for boundary cases (if any)

ci_method

Character. Method used for confidence intervals ("wald", "profile", "bootstrap")

...

Additional columns for stratification variables if specified

The returned object has attributes including the original function call and alpha level used. Risk differences are on the probability scale where 0.05 represents a 5 percentage point difference.

References

Donoghoe MW, Marschner IC (2018). "logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model." Journal of Statistical Software, 86(9), 1-22. doi:10.18637/jss.v086.i09

Marschner IC, Gillett AC (2012). "Relative Risk Regression: Reliable and Flexible Methods for Log-Binomial Models." Biostatistics, 13(1), 179-192.

Venzon DJ, Moolgavkar SH (1988). "A Method for Computing Profile-Likelihood-Based Confidence Intervals." Journal of the Royal Statistical Society, 37(1), 87-94.

Rothman KJ, Greenland S, Lash TL (2008). Modern Epidemiology, 3rd edition. Lippincott Williams & Wilkins.

Examples

# Simple risk difference
data(cachar_sample)
rd_simple <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut"
)
print(rd_simple)

# Age-adjusted risk difference
rd_adjusted <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  adjust_vars = "age"
)
print(rd_adjusted)

# Stratified analysis with enhanced error checking and boundary detection
rd_stratified <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  strata = "residence",
  verbose = TRUE  # See diagnostic messages and boundary detection
)
print(rd_stratified)

# Check for boundary cases
if (any(rd_stratified$on_boundary)) {
  cat("Boundary cases detected!\n")
  boundary_rows <- which(rd_stratified$on_boundary)
  for (i in boundary_rows) {
    cat("Row", i, ":", rd_stratified$boundary_type[i], "\n")
  }
}

# Force profile likelihood CIs for enhanced robustness
rd_profile <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  boundary_method = "profile"
)


riskdiff documentation built on June 30, 2025, 9:07 a.m.