simulate_data_nonlinear | R Documentation |
Simulation of data from a confounded non-linear model. The data generating process is given by:
Y = f(X) + \delta^T H + \nu
X = \Gamma^T H + E
where f(X)
is a random function on the fourier basis
with a subset of size m covariates X_j
having a causal effect on Y
.
f(x_i) = \sum_{j = 1}^p 1_{j \in js} \sum_{k = 1}^K (\beta_{j, k}^{(1)} \cos(0.2 k x_j) +
\beta_{j, k}^{(2)} \sin(0.2 k x_j))
E
, \nu
are random error terms and
H \in \mathbb{R}^{n \times q}
is a matrix of random confounding covariates.
\Gamma \in \mathbb{R}^{q \times p}
and \delta \in \mathbb{R}^{q}
are random coefficient vectors.
For the simulation, all the above parameters are drawn from a standard normal distribution, except for
\nu
which is drawn from a normal distribution with standard deviation 0.1.
The parameters \beta
are drawn from a uniform distribution between -1 and 1.
simulate_data_nonlinear(q, p, n, m, K = 2, eff = NULL, fixEff = FALSE)
q |
number of confounding covariates in H |
p |
number of covariates in X |
n |
number of observations |
m |
number of covariates with a causal effect on Y |
K |
number of fourier basis functions K |
eff |
the number of affected covariates in X by the confounding, if NULL all covariates are affected |
fixEff |
if eff is smaller than p: If fixEff = TRUE, the causal parents are always affected by confounding if fixEff = FALSE, affected covariates are chosen completely at random. |
a list containing the simulated data:
X |
a matrix of covariates |
Y |
a vector of responses |
f_X |
a vector of the true function f(X) |
j |
the indices of the causal covariates in X |
beta |
the parameter vector for the function f(X), see |
H |
the matrix of confounding covariates |
Markus Ulmer
f_four
set.seed(42)
# simulation of confounded data
sim_data <- simulate_data_nonlinear(q = 2, p = 150, n = 100, m = 2)
X <- sim_data$X
Y <- sim_data$Y
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.