View source: R/get_simpsons_paradox_d.R
| get_simpsons_paradox_d | R Documentation | 
This function modifies contingency tables associated with different levels of a categorical variable to create or highlight Simpson's Paradox using simulated annealing. The paradox occurs when aggregated data trends differ from subgroup trends.
get_simpsons_paradox_d(
  x,
  y,
  z,
  manual_vec,
  target_overall,
  margin,
  margin_overall,
  max_n = 1000,
  temp = 10,
  log_odds_general = log_odds_dc
)
| x | A vector of categorical values for the first variable. | 
| y | A vector of categorical values for the second variable. | 
| z | A vector indicating levels of a third variable that segments the data. | 
| manual_vec | A numeric vector specifying target log-odds trends for each level of  | 
| target_overall | A numeric value representing the target log-odds for the aggregated data. | 
| margin | A numeric value for allowed deviation in log-odds within each subgroup. | 
| margin_overall | A numeric value for allowed deviation in aggregated log-odds. | 
| max_n | An integer specifying the maximum number of iterations for the annealing process. | 
| temp | A numeric value for the initial temperature in the annealing process. | 
| log_odds_general | A function to compute the log-odds for a given contingency table (default:  | 
This function works by iteratively modifying individual matrices (contingency tables) corresponding
to levels of z while respecting log-odds constraints. The overall log-odds of the aggregated table
are also adjusted to achieve the specified target_overall. Simulated annealing ensures that the
modifications balance between achieving the targets and avoiding overfitting.
A list containing:
final_df: A data frame representing the modified dataset.
final_table: A list of modified contingency tables.
history: A data frame tracking the overall log-odds over iterations.
# Example with predefined contingency tables
set.seed(42)
matrices <- list(
  ta = matrix(c(512, 89, 313, 19), ncol = 2, byrow = TRUE),
  tb = matrix(c(353, 17, 207, 8), ncol = 2, byrow = TRUE),
  tc = matrix(c(120, 202, 205, 391), ncol = 2, byrow = TRUE)
)
df_list <- lapply(seq_along(matrices), function(i) {
  mat <- matrices[[i]]
  z_level <- names(matrices)[i]
  df <- as.data.frame(as.table(mat))
  colnames(df) <- c("x", "y", "Freq")
  df$z <- z_level
  return(df)
})
final_df <- do.call(rbind, df_list)
expanded_df <- final_df[rep(1:nrow(final_df), final_df$Freq), c("x", "y", "z")]
result <- get_simpsons_paradox_d(
  expanded_df$x, expanded_df$y, expanded_df$z,
  manual_vec = c(-1, -1, -1),
  target_overall = +1,
  margin = 0.2, margin_overall = 0.2, max_n = 200
)
table(expanded_df$x) - table(result$final_df$x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.