#' Simulate data of a causal (possibly cyclic model) under interventions.
#'
#' @description Simulate data of a causal (possibly cyclic model) under interventions.
#'
#' @param n Number of observations.
#' @param p Number of variables.
#' @param df Degrees of freedom in t-distribution of noise and interventions.
#' @param rhoNoise Correlation between noise terms to model hidden variabkes.
#' Set to 0 for independent noise.
#' @param snrPar Signal-to-noise parameter: steers what proportion of the variance stems from
#' the signal resp.\ from the noise: The SNR is given by $SNR = (1-\code{snrPar})/\code{snrPar}$),
#' see details. Only holds when \code{cyclic = FALSE}.
#' @param sparse Probability that an entry \eqn{i,j} in adjacency matrix is 1.
#' @param doInterv Set to TRUE if interventions should be do-interventions; otherwise
#' noise interventions (also called shift interventions) are generated.
#' @param numberInt Total number of settings.
#' @param strengthInt Regulates the strength of the interventions, see details.
#' @param cyclic Set to TRUE is resulting graph should contain a cycle.
#' @param strengthCycle Steers strength of feedback, see details.
#' @param modelMis Add a model misspecification that applies \code{tanh(modelMisPar*x)/modelMisPar)}
#' morginally to each variable after having generated X from the causal DAG.
#' @param modelMisPar Parameter steering the strength of the model misspecification.
#' @param seed Random seed.
#' @return A list with the following elements:
#' \itemize{
#' \item \code{X} \eqn{n x p}-dimensional data matrix
#' \item \code{environment} Indicator of the experiment or the intervention type an
#' observation belongs to. A numeric vector of length \eqn{n}.
#' \item \code{interventions} A list of length \eqn{n}. Indicates location of interventions
#' for each data point.
#' \item \code{whereInt} A list of length \code{numberInt}. Indicates location of interventions
#' in each setting.
#' \item \code{noise}
#' \item \code{configs} A list with the generated adjacency matrix (\code{trueA})
#' as well as all input arguments.
#' }
#'
#' @details The adjacency matrix \eqn{A} is generated as follows. Assume the variables
#' with indices \eqn{{1, \ldots, p}} are causally ordered. For each edge from node
#' \eqn{i} to node \eqn{j} where \eqn{i} precedes \eqn{j} in the causal ordering,
#' we draw a sample from Bin(\code{sparse}) to determine whether to add an edge
#' from node \eqn{i} to node \eqn{j}. After having sampled the non-zero entries
#' of \eqn{A} in this fashion, we sample the coefficients from Unif(-1,1).
#' As described below, the edge weights are later rescaled to achieve a specified
#' signal-to-noise ratio. We exclude the possibility of \eqn{A = 0},
#' i.e. we resample until \eqn{A} contains at least one non-zero entry.
#'
#' Second, the interventions are generated as follows. \code{numberInt} denotes the total
#' number of (interventional and observational) settings that are generated.
#' For each variable, we sample uniformly at random with replacement one setting
#' in which this variable is intervened on. In other words, each variable is
#' intervened on in exactly one setting. Hence it is possible that there are
#' settings where no interventions take place which then correspond to the
#' observational case. Similarly, there may be settings where interventions
#' are performed on multiple variables at once. After defining the settings,
#' we sample (uniformly at random with replacement) what setting each data point
#' belongs to. So for each setting we generate approximately the same number of
#' samples. In one generated data set, the interventions are all of the same
#' type, i.e. they are either all shift interventions (when \code{doInterv = FALSE})
#' or do-interventions (when \code{doInterv = TRUE}). In both cases, an intervention
#' on \eqn{X_j} is modelled by generating \eqn{Z_j} as \eqn{Z_j ~} \code{strengthInt} \eqn{* t}(\code{dfNoise}).
#' If \code{strengthInt} = 0, all interventional settings correspond to purely
#' observational data.
#'
#' Third, the noise terms \eqn{\epsilon} are generated by first sampling from
#' \eqn{N(0,\Sigma)} where \eqn{\Sigma_{i,i} = 1} and
#' \eqn{\Sigma_{i,j} =} \code{rhoNoise}. To steer the signal-to-noise ratio,
#' we set the variance of the noise terms of all nodes except source nodes
#' to \code{snrPar} where \eqn{0 < }\code{snrPar}\eqn{ \le 1}. Stepping through the
#' variables in causal order, for each variable \eqn{X_j} that has parents, we
#' uniformly rescale the edge weights \eqn{\beta_{j,k}} for \eqn{k = 1, \ldots, p}
#' in the structural equation of variable \eqn{X_j} such that the variance of
#' the sum \eqn{\sum_{k=1}^p \beta_{j,k} X_k + \epsilon_j} is approximately
#' 1 in the observational setting. In other words, the parameter \code{snrPar}
#' steers what proportion of the variance stems from the signal given by
#' \eqn{\sum_{k=1}^p \beta_{j,k} X_k} and what proportion stems from the
#' noise \eqn{\epsilon_j}. The signal-to-noise ratio can then be computed
#' as SNR = (1-\code{snrPar})/\code{snrPar}.
#'
#' Forth, a cycle is added to the causal graph if \code{cyclic = TRUE}. If the
#' causal graph shall contain a cycle, we sample two nodes \eqn{i} and \eqn{j}
#' such that adding an edge between them creates a cycle in the causal graph.
#' We then compute the largest possible coefficient for this edge such that the
#' cycle product is smaller than 1. Subsequently, we sample the sign of the
#' coefficient and set the magnitude by scaling the largest possible coefficient
#' by \code{strengthCycle} where \eqn{0 < }\code{strengthCycle}\eqn{< 1}.
#'
#' Fifth, we rescale the noise variables to obtain a \eqn{t}-distribution with
#' \code{dfNoise} degrees of freedom. \eqn{X} is then generated as
#' \eqn{X = (I-A)^{-1}\epsilon} in the observational case; under a shift
#' interventions \eqn{X} can be generated as \eqn{X = (I-A)^{-1}(\epsilon + Z)}
#' where the coordinates of \eqn{Z} are only non-zero for the variables
#' that are intervened on. Under a do-intervention on \eqn{X_j}, \eqn{\beta_{j,k}}
#' for \eqn{k = 1, \ldots, p} are set to 0 to yield \eqn{A'} and \eqn{\epsilon_j}
#' is set to \eqn{Z_j} to yield \eqn{\epsilon_j'}. We then obtain \eqn{X} as
#' \eqn{X = (I-A')^{-1}\epsilon'}.
#'
#' Lastly, if \code{modelMis = TRUE} a model misspecification is added to the
#' data by marginally transforming all variables as \code{tanh(modelMisPar*x)/modelMisPar)}.
#'
simulateInterventions <- function(n, p, df, rhoNoise, snrPar,
sparse, doInterv,
numberInt, strengthInt,
cyclic, strengthCycle,
modelMis = FALSE, modelMisPar = 1,
seed =1){
###### set seed
set.seed(seed)
if(snrPar == 0 | snrPar > 1){
stop("snrPar needs to be larger than 0 and smaller or equal to 1.")
}
if(cyclic & strengthCycle >= 1){
stop("strengthCycle needs to be smaller than 1.")
}
if(numberInt <= 0){
stop("numberInt needs to be at least 1.")
}
# generate A
## skeleton
cont <- TRUE
while(cont){
AS <- 0*diag(p)
for (k in 1:p){
for (k2 in 1:p){
if(k2>k) AS[k,k2] <- rbinom(1,1,sparse)
}
}
if(sum(AS)!=0) cont <- FALSE
}
A <- AS * matrix( runif(p^2,-1,1),nrow=p)
####### initialize
Perturb <- matrix(0,nrow=n,ncol=p)
### simulate environments
whereInt <- list()
if(numberInt == 1){
whereInt[[1]] <- integer(0)
environment <- rep(1, n)
interventions <- lapply(1:n, function(i) integer(0))
}else{
environment_var <- sample(1:numberInt, p, replace=TRUE)
for(nic in 1:numberInt){
whereInt[[nic]] <- which(environment_var == nic)
}
environment <- sample(1:numberInt, n, replace=TRUE)
interventions <- list()
for (i in 1:n){
interventions[[i]] <- whereInt[[environment[i]]]
}
}
###### simulate shift interventions
for(intervEnv in 1:numberInt){
rowsSelect <- environment == intervEnv
colsSelect <- whereInt[[intervEnv]]
Perturb[rowsSelect, whereInt[[intervEnv]]] <-
strengthInt*rt( length(colsSelect) * sum(rowsSelect) ,df=df)
}
###### simulate noise
noise <- matrix(rnorm(n*p),nrow=n)
Sigma <- matrix(rhoNoise,nrow=p,ncol=p)
diag(Sigma) <- 1
CC <- chol(Sigma)
noise <- noise %*% CC
noise[,which(colSums(A) != 0)] <- sqrt(snrPar)*noise[,which(colSums(A) != 0)]
# change A to achieve given SNR in observational case
A <- setSNR(noise = noise, A)
if(cyclic){
cont <- TRUE
Atmp <- A
while(cont){
# add cycle to A
# sample uniformly at random one connection backwards
backConnect <- sample(1:p, 2, replace = FALSE)
Atmp[max(backConnect), min(backConnect)] <- 1
if(sum(A) == 0){
Atmp[min(backConnect), max(backConnect)] <- A[min(backConnect), max(backConnect)] <- runif(1,-1,1)
}
# find cyclic path from max(backConnect) back to max(backConnect)
# and get largest coefficient path can have (if it exists)
cpmat <- getLargestWeightForCycle(Atmp)$cpMat
if(cpmat[max(backConnect), max(backConnect)] != 0){
cont <- FALSE
}else{
Atmp <- A
}
}
# sample sign of weight of backward connection
signEntry <- sample(c(-1,1),1)
# adjust edge weight by strengthCycle parameter
A[max(backConnect), min(backConnect)] <- signEntry*strengthCycle/
cpmat[max(backConnect), max(backConnect)]
}
# apply t-distribution for noise
noise <- apply(noise, 2, function(x) qt(pnorm(x), df = df))
if(doInterv & strengthInt > 0){
X <- matrix(0, nrow = n, ncol = p)
# for do interventions cycle through environments and adjust A
# as well as noise + Perturb accordingly
for(env in 1:numberInt){
# data points from setting env
inds <- which(environment == env)
# modify A: intervention on var j means to cut all incoming connections
Amod <- A
Amod[,whereInt[[env]]] <- 0
# get inverse
inv <- solve(diag(p) - Amod)
# replace columns of noise where intervention was applied
noiseMat <- noise[inds,,drop = FALSE]
noiseMat[,whereInt[[env]]] <- Perturb[inds,whereInt[[env]], drop = FALSE]
X[inds,] <- noiseMat%*%inv
}
# noise <- noiseMat
}else{
inv <- solve(diag(p) - A)
X <- (noise + Perturb)%*%inv
}
if(modelMis){
X <- apply(X,2,function(x) tanh(modelMisPar*x)/modelMisPar)
}
config.options <- list(trueA = A,
n = n,
p = p,
df = df,
rhoNoise = rhoNoise,
snrPar = snrPar,
sparse = sparse,
doInterv = doInterv,
numberInt = numberInt,
strengthInt = strengthInt,
cyclic = cyclic,
strengthCycle = strengthCycle,
modelMis = modelMis,
modelMisPar = modelMisPar,
seed = seed)
list(X = X,
environment = environment,
interventions = interventions,
whereInt = whereInt,
noise = noise,
configs = config.options)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.