get_simpsons_paradox_c: Simpson's Paradox Transformation with Copula and Simulated...

View source: R/get_simpsons_paradox_c.R

get_simpsons_paradox_cR Documentation

Simpson's Paradox Transformation with Copula and Simulated Annealing

Description

This function simulates the Simpson's Paradox phenomenon by transforming data using Gaussian copulas, optimizing the transformation with simulated annealing, and comparing the results.

Usage

get_simpsons_paradox_c(
  x,
  y,
  z,
  corr_vector,
  inv_cdf_type = "quantile_7",
  sd_x = 0.05,
  sd_y = 0.05,
  lambda1 = 1,
  lambda2 = 1,
  lambda3 = 1,
  lambda4 = 1,
  max_iter = 1000,
  initial_temp = 1,
  cooling_rate = 0.99,
  order_vec = NA,
  degree = 5
)

Arguments

x

A numeric vector of data points for variable X.

y

A numeric vector of data points for variable Y.

z

A categorical variable representing groups (e.g., factor or character vector).

corr_vector

A vector of correlations for each category of z.

inv_cdf_type

Type of inverse CDF transformation ("quantile_1", "quantile_4", "quantile_7", "quantile_8", "linear", "akima", "poly"). Default is "quantile_7".

sd_x

Standard deviation for perturbations on X (default is 0.05).

sd_y

Standard deviation for perturbations on Y (default is 0.05).

lambda1

Regularization parameter for simulated annealing (default is 1).

lambda2

Regularization parameter for simulated annealing (default is 1).

lambda3

Regularization parameter for simulated annealing (default is 1).

lambda4

Regularization parameter for simulated annealing (default is 1).

max_iter

Maximum iterations for simulated annealing (default is 1000).

initial_temp

Initial temperature for simulated annealing (default is 1.0).

cooling_rate

Cooling rate for simulated annealing (default is 0.99).

order_vec

Manual ordering of grids (default is NA, calculated automatically if not specified).

degree

Degree of polynomial used for polynomial inverse CDF (default is 5).

Value

A list containing:

df_all

The final dataset with original, transformed, and annealed data.

df_res

A simplified version with only the optimized data.

Examples

set.seed(123)
n <- 300
z <- sample(c("A", "B", "C"), prob = c(0.3, 0.4, 0.3), size = n, replace = TRUE)
x <- rnorm(n, 10, sd = 5) + 5 * rbeta(n, 5, 3)
y <- 2 * x + rnorm(n, 5, sd = 4)
t <- c(-0.8, 0.8, -0.8)
res <- get_simpsons_paradox_c(x, y, z, t, sd_x = 0.07, sd_y = 0.07, lambda4 = 5)


covalchemy documentation built on April 12, 2025, 2:15 a.m.