sim_dgp_ewald: Simulate data as in Ewald et al. (2024)

View source: R/data-sim.R

sim_dgp_ewaldR Documentation

Simulate data as in Ewald et al. (2024)

Description

Reproduces the data generating process from Ewald et al. (2024) for benchmarking feature importance methods. Includes correlated features and interaction effects.

Usage

sim_dgp_ewald(n = 500)

Arguments

n

(integer(1)) Number of samples to create.

Details

Mathematical Model:

X_1, X_3, X_5 \sim \text{Uniform}(0,1)

X_2 = X_1 + \varepsilon_2, \quad \varepsilon_2 \sim N(0, \sqrt{0.001})

X_4 = X_3 + \varepsilon_4, \quad \varepsilon_4 \sim N(0, \sqrt{0.1})

Y = X_4 + X_5 + X_4 \cdot X_5 + \varepsilon, \quad \varepsilon \sim N(0, \sqrt{0.1})

Feature Properties:

  • X1, X3, X5: Independent uniform(0,1) distributions

  • X2: Nearly perfect copy of X1 (correlation approximately 0.99)

  • X4: Noisy copy of X3 (correlation approximately 0.94)

  • Y depends on X4, X5, and their interaction

Value

A regression task (mlr3::TaskRegr) with data.table backend.

References

Ewald F, Bothmann L, Wright M, Bischl B, Casalicchio G, König G (2024). “A Guide to Feature Importance Methods for Scientific Inference.” In Longo L, Lapuschkin S, Seifert C (eds.), Explainable Artificial Intelligence, 440–464. ISBN 978-3-031-63797-1, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-031-63797-1_22")}.

See Also

Other simulation: sim_dgp_scenarios

Examples

sim_dgp_ewald(100)


xplainfi documentation built on Feb. 27, 2026, 1:08 a.m.