Description Usage Arguments Details Value
Simulate data of a causal (possibly cyclic model) under interventions.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | simulateInterventions(
  n,
  p,
  df,
  rhoNoise,
  snrPar,
  sparse,
  doInterv,
  numberInt,
  strengthInt,
  cyclic,
  strengthCycle,
  modelMis = FALSE,
  modelMisPar = 1,
  seed = 1
)
 | 
| n | Number of observations. | 
| p | Number of variables. | 
| df | Degrees of freedom in t-distribution of noise and interventions. | 
| rhoNoise | Correlation between noise terms to model hidden variabkes. Set to 0 for independent noise. | 
| snrPar | Signal-to-noise parameter: steers what proportion of the variance stems from 
the signal resp.\ from the noise: The SNR is given by $SNR = (1- | 
| sparse | Probability that an entry i,j in adjacency matrix is 1. | 
| doInterv | Set to TRUE if interventions should be do-interventions; otherwise noise interventions (also called shift interventions) are generated. | 
| numberInt | Total number of settings. | 
| strengthInt | Regulates the strength of the interventions, see details. | 
| cyclic | Set to TRUE is resulting graph should contain a cycle. | 
| strengthCycle | Steers strength of feedback, see details. | 
| modelMis | Add a model misspecification that applies  | 
| modelMisPar | Parameter steering the strength of the model misspecification. | 
| seed | Random seed. | 
The adjacency matrix A is generated as follows. Assume the variables 
with indices {1, …, p} are causally ordered. For each edge from node 
i to node j where i precedes j in the causal ordering, 
we draw a sample from Bin(sparse) to determine whether to add an edge 
from node i to node j. After having sampled the non-zero entries 
of A in this fashion, we sample the coefficients from Unif(-1,1). 
As described below, the edge weights are later rescaled to achieve a specified 
signal-to-noise ratio. We exclude the possibility of A = 0, 
i.e. we resample until A contains at least one non-zero entry.
Second, the interventions are generated as follows. numberInt denotes the total 
number of (interventional and observational) settings that are generated. 
For each variable, we sample uniformly at random with replacement one setting 
in which this variable is intervened on. In other words, each variable is 
intervened on in exactly one setting. Hence it is possible that there are 
settings where no interventions take place which then correspond to the 
observational case. Similarly, there may be settings where interventions 
are performed on multiple variables at once. After defining the settings, 
we sample (uniformly at random with replacement) what setting each data point 
belongs to. So for each setting we generate approximately the same number of 
samples. In one generated data set, the interventions are all of the same 
type, i.e. they are either all shift interventions (when doInterv = FALSE) 
or do-interventions (when doInterv = TRUE). In both cases, an intervention 
on X_j is modelled by generating Z_j as Z_j ~ strengthInt * t(dfNoise). 
If strengthInt = 0, all interventional settings correspond to purely 
observational data.
Third, the noise terms ε are generated by first sampling from 
N(0,Σ) where Σ_{i,i} = 1 and 
Σ_{i,j} = rhoNoise. To steer the signal-to-noise ratio, 
we set the variance of the noise terms of all nodes except source nodes 
to snrPar where 0 < snrPar ≤ 1. Stepping through the 
variables in causal order, for each variable X_j that has parents, we 
uniformly rescale the edge weights β_{j,k} for k = 1, …, p 
in the structural equation of variable X_j such that the variance of 
the sum ∑_{k=1}^p  β_{j,k} X_k + ε_j is approximately 
1 in the observational setting. In other words, the parameter snrPar
steers what proportion of the variance stems from the signal given by  
∑_{k=1}^p  β_{j,k} X_k and what proportion stems from the 
noise ε_j. The signal-to-noise ratio can then be computed 
as SNR = (1-snrPar)/snrPar.
Forth, a cycle is added to the causal graph if cyclic = TRUE. If the 
causal graph shall contain a cycle, we sample two nodes i and j 
such that adding an edge between them creates a cycle in the causal graph. 
We then compute the largest possible coefficient for this edge such that the 
cycle product is smaller than 1. Subsequently, we sample the sign of the 
coefficient and set the magnitude by scaling the largest possible coefficient 
by strengthCycle where 0 < strengthCycle< 1.
Fifth, we rescale the noise variables to obtain a t-distribution with 
dfNoise degrees of freedom. X is then generated as 
X  = (I-A)^{-1}ε in the observational case; under a shift 
interventions X can be generated as X  = (I-A)^{-1}(ε + Z) 
where the coordinates of Z are only non-zero for the variables 
that are intervened on. Under a do-intervention on X_j, β_{j,k}
for k = 1, …, p are set to 0 to yield A' and ε_j
is set to Z_j to yield ε_j'. We then obtain X as 
X  = (I-A')^{-1}ε'.
Lastly, if modelMis = TRUE a model misspecification is added to the 
data by marginally transforming all variables as tanh(modelMisPar*x)/modelMisPar).
A list with the following elements:
X n x p-dimensional data matrix
environment Indicator of the experiment or the intervention type an 
observation belongs to. A numeric vector of length n.  
interventions A list of length n. Indicates location of interventions
for each data point.
whereInt A list of length  numberInt. Indicates location of interventions
in each setting.
noise
configs A list with the generated adjacency matrix (trueA)
as well as all input arguments.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.