simulate_data_step: Simulate data with linear confounding and causal effect...

View source: R/simData.R

simulate_data_stepR Documentation

Simulate data with linear confounding and causal effect following a step-function

Description

Simulation of data from a confounded non-linear model. Where the non-linear function is a random regression tree. The data generating process is given by:

Y = f(X) + \delta^T H + \nu

X = \Gamma^T H + E

where f(X) is a random regression tree with m random splits of the data. Resulting in a random step-function with m+1 levels, i.e. leaf-levels.

f(x_i) = \sum_{k = 1}^K 1_{\{x_i \in R_k\}} c_k

E, \nu are random error terms and H \in \mathbb{R}^{n \times q} is a matrix of random confounding covariates. \Gamma \in \mathbb{R}^{q \times p} and \delta \in \mathbb{R}^{q} are random coefficient vectors. For the simulation, all the above parameters are drawn from a standard normal distribution, except for \delta which is drawn from a normal distribution with standard deviation 10. The leaf levels c_k are drawn from a uniform distribution between -50 and 50.

Usage

simulate_data_step(q, p, n, m, make_tree = FALSE)

Arguments

q

number of confounding covariates in H

p

number of covariates in X

n

number of observations

m

number of covariates with a causal effect on Y

make_tree

Whether the random regression tree should be returned.

Value

a list containing the simulated data:

X

a matrix of covariates

Y

a vector of responses

f_X

a vector of the true function f(X)

j

the indices of the causal covariates in X

tree

If make_tree, the random regression tree of class Node from \insertCiteGlur2023Data.tree:StructureSDModels

Author(s)

Markus Ulmer

References

\insertAllCited

See Also

simulate_data_nonlinear

Examples

set.seed(42)
# simulation of confounded data
sim_data <- simulate_data_step(q = 2, p = 15, n = 100, m = 2)
X <- sim_data$X
Y <- sim_data$Y

SDModels documentation built on April 11, 2025, 5:50 p.m.