simulate_data_nonlinear: Simulate data with linear confounding and non-linear causal...

View source: R/simData.R

simulate_data_nonlinearR Documentation

Simulate data with linear confounding and non-linear causal effect

Description

Simulation of data from a confounded non-linear model. The data generating process is given by:

Y = f(X) + \delta^T H + \nu

X = \Gamma^T H + E

where f(X) is a random function on the fourier basis with a subset of size m covariates X_j having a causal effect on Y.

f(x_i) = \sum_{j = 1}^p 1_{j \in js} \sum_{k = 1}^K (\beta_{j, k}^{(1)} \cos(0.2 k x_j) + \beta_{j, k}^{(2)} \sin(0.2 k x_j))

E, \nu are random error terms and H \in \mathbb{R}^{n \times q} is a matrix of random confounding covariates. \Gamma \in \mathbb{R}^{q \times p} and \delta \in \mathbb{R}^{q} are random coefficient vectors. For the simulation, all the above parameters are drawn from a standard normal distribution, except for \nu which is drawn from a normal distribution with standard deviation 0.1. The parameters \beta are drawn from a uniform distribution between -1 and 1.

Usage

simulate_data_nonlinear(q, p, n, m, K = 2, eff = NULL, fixEff = FALSE)

Arguments

q

number of confounding covariates in H

p

number of covariates in X

n

number of observations

m

number of covariates with a causal effect on Y

K

number of fourier basis functions K K \in \mathbb{N}, e.g. complexity of causal function

eff

the number of affected covariates in X by the confounding, if NULL all covariates are affected

fixEff

if eff is smaller than p: If fixEff = TRUE, the causal parents are always affected by confounding if fixEff = FALSE, affected covariates are chosen completely at random.

Value

a list containing the simulated data:

X

a matrix of covariates

Y

a vector of responses

f_X

a vector of the true function f(X)

j

the indices of the causal covariates in X

beta

the parameter vector for the function f(X), see f_four

H

the matrix of confounding covariates

Author(s)

Markus Ulmer

See Also

f_four

Examples

set.seed(42)
# simulation of confounded data
sim_data <- simulate_data_nonlinear(q = 2, p = 150, n = 100, m = 2)
X <- sim_data$X
Y <- sim_data$Y


SDModels documentation built on April 11, 2025, 5:50 p.m.