get_simpsons_paradox_d: Introduce Simpson's Paradox in Discrete Data
In covalchemy: Constructing Joint Distributions with Control Over Statistical Properties

get_simpsons_paradox_d

R Documentation

Introduce Simpson's Paradox in Discrete Data

Description

This function modifies contingency tables associated with different levels of a categorical variable to create or highlight Simpson's Paradox using simulated annealing. The paradox occurs when aggregated data trends differ from subgroup trends.

Usage

get_simpsons_paradox_d(
  x,
  y,
  z,
  manual_vec,
  target_overall,
  margin,
  margin_overall,
  max_n = 1000,
  temp = 10,
  log_odds_general = log_odds_dc
)

Arguments

`x`	A vector of categorical values for the first variable.
`y`	A vector of categorical values for the second variable.
`z`	A vector indicating levels of a third variable that segments the data.
`manual_vec`	A numeric vector specifying target log-odds trends for each level of `z`.
`target_overall`	A numeric value representing the target log-odds for the aggregated data.
`margin`	A numeric value for allowed deviation in log-odds within each subgroup.
`margin_overall`	A numeric value for allowed deviation in aggregated log-odds.
`max_n`	An integer specifying the maximum number of iterations for the annealing process.
`temp`	A numeric value for the initial temperature in the annealing process.
`log_odds_general`	A function to compute the log-odds for a given contingency table (default: `log_odds_dc`).

Details

This function works by iteratively modifying individual matrices (contingency tables) corresponding to levels of z while respecting log-odds constraints. The overall log-odds of the aggregated table are also adjusted to achieve the specified target_overall. Simulated annealing ensures that the modifications balance between achieving the targets and avoiding overfitting.

Value

A list containing:

final_df: A data frame representing the modified dataset.
final_table: A list of modified contingency tables.
history: A data frame tracking the overall log-odds over iterations.

Examples

# Example with predefined contingency tables
set.seed(42)
matrices <- list(
  ta = matrix(c(512, 89, 313, 19), ncol = 2, byrow = TRUE),
  tb = matrix(c(353, 17, 207, 8), ncol = 2, byrow = TRUE),
  tc = matrix(c(120, 202, 205, 391), ncol = 2, byrow = TRUE)
)
df_list <- lapply(seq_along(matrices), function(i) {
  mat <- matrices[[i]]
  z_level <- names(matrices)[i]
  df <- as.data.frame(as.table(mat))
  colnames(df) <- c("x", "y", "Freq")
  df$z <- z_level
  return(df)
})
final_df <- do.call(rbind, df_list)
expanded_df <- final_df[rep(1:nrow(final_df), final_df$Freq), c("x", "y", "z")]
result <- get_simpsons_paradox_d(
  expanded_df$x, expanded_df$y, expanded_df$z,
  manual_vec = c(-1, -1, -1),
  target_overall = +1,
  margin = 0.2, margin_overall = 0.2, max_n = 200
)
table(expanded_df$x) - table(result$final_df$x)

covalchemy documentation built on April 12, 2025, 2:15 a.m.

covalchemy index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

covalchemy
Constructing Joint Distributions with Control Over Statistical Properties

get_simpsons_paradox_d: Introduce Simpson's Paradox in Discrete Data
In covalchemy: Constructing Joint Distributions with Control Over Statistical Properties

Introduce Simpson's Paradox in Discrete Data

Description

Usage

Arguments

Details

Value

Examples

Related to get_simpsons_paradox_d in covalchemy...

R Package Documentation

Browse R Packages

We want your feedback!

covalchemy Constructing Joint Distributions with Control Over Statistical Properties

get_simpsons_paradox_d: Introduce Simpson's Paradox in Discrete Data In covalchemy: Constructing Joint Distributions with Control Over Statistical Properties

Introduce Simpson's Paradox in Discrete Data

Description

Usage

Arguments

Details

Value

Examples

Related to get_simpsons_paradox_d in covalchemy...

R Package Documentation

Browse R Packages

We want your feedback!

covalchemy
Constructing Joint Distributions with Control Over Statistical Properties

get_simpsons_paradox_d: Introduce Simpson's Paradox in Discrete Data
In covalchemy: Constructing Joint Distributions with Control Over Statistical Properties