simulate_data_step | R Documentation |
Simulation of data from a confounded non-linear model. Where the non-linear function is a random regression tree. The data generating process is given by:
Y = f(X) + \delta^T H + \nu
X = \Gamma^T H + E
where f(X)
is a random regression tree with m
random splits of the data.
Resulting in a random step-function with m+1
levels, i.e. leaf-levels.
f(x_i) = \sum_{k = 1}^K 1_{\{x_i \in R_k\}} c_k
E
, \nu
are random error terms and
H \in \mathbb{R}^{n \times q}
is a matrix of random confounding covariates.
\Gamma \in \mathbb{R}^{q \times p}
and \delta \in \mathbb{R}^{q}
are random coefficient vectors.
For the simulation, all the above parameters are drawn from a standard normal distribution, except for
\delta
which is drawn from a normal distribution with standard deviation 10.
The leaf levels c_k
are drawn from a uniform distribution between -50 and 50.
simulate_data_step(q, p, n, m, make_tree = FALSE)
q |
number of confounding covariates in H |
p |
number of covariates in X |
n |
number of observations |
m |
number of covariates with a causal effect on Y |
make_tree |
Whether the random regression tree should be returned. |
a list containing the simulated data:
X |
a |
Y |
a |
f_X |
a |
j |
the indices of the causal covariates in X |
tree |
If |
Markus Ulmer
simulate_data_nonlinear
set.seed(42)
# simulation of confounded data
sim_data <- simulate_data_step(q = 2, p = 15, n = 100, m = 2)
X <- sim_data$X
Y <- sim_data$Y
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.