backShift: Estimate connectivity matrix of a directed graph with linear...
In backShift: Learning Causal Cyclic Graphs from Unknown Shift Interventions

Description Usage Arguments Value Author(s) References See Also Examples

This function estimates the connectivity matrix of a directed (possibly cyclic) graph with hidden variables. The underlying system is required to be linear and we assume that observations under different shift interventions are available. More precisely, the function takes as an input an (nxp) data matrix, where n is the sample size and p the number of variables. In each environment j (j in {1, …, J}) we have observed n_j samples generated from

X_j= X_j * A + c_j + e_j

(in case of cycles this should be understood as an equilibrium distribution). The c_j is a p-dimensional random vector that is assumed to have a diagonal covariance matrix. The noise vector e_j is assumed to have the same distribution in all environments j but is allowed to have an arbitrary covariance matrix. The different intervention settings are provided to the method with the help of the vector ExpInd of length n = (n_1 + ... + n_j + ... + n_J). The goal is to estimate the connectivity matrix A.

backShift(X, ExpInd, covariance=TRUE, ev=0, threshold =0.75, nsim=100,
          sampleSettings=1/sqrt(2), sampleObservations=1/sqrt(2),
          nodewise=TRUE, tolerance = 10^(-4), baseSettingEnv = 1,
          verbose = FALSE)

`X`	A (nxp)-dimensional matrix (or data frame) with n observations of p variables.
`ExpInd`	Indicator of the experiment or the intervention type an observation belongs to. A numeric vector of length n. Has to contain at least three different unique values.
`covariance`	A boolean variable. If `TRUE`, use only shift in covariance matrix; otherwise use shift in Gram matrix. Set only to `FALSE` if at most one variable has a non-zero shift in mean in the same setting (default is `TRUE`).
`ev`	The expected number of false selections for stability selection. No stability selection computed if `ev=0`. Defaults to `ev=0`.
`threshold`	The selection threshold for stability selection (has to be between 0.5 and 1). Edges which are selected with empirical proportion higher than `threshold` will be retained.
`nsim`	Number of resamples taken (if using stability selection).
`sampleSettings`	The proportion of unique settings to resample for each resample; has to be in [0,1].
`sampleObservations`	The fraction of all samples to retain when subsampling (no replacement); has to be in [0,1].
`nodewise`	If `FALSE`, stability selection retains for each subsample the largest overall entries in the connectivity matrix. If `TRUE`, values are ordered row- and node-wise first and then the largest entries in each row and column are retained. Error control is valid (under exchangeability assumption) in both cases. The latter setting `TRUE` is perhaps more robust and is the default.
`tolerance`	Precision parameter for `ffdiag`: the algorithm stops when the criterium difference between two iterations is less than `tolerance`. Default is 10^(-4).
`baseSettingEnv`	Index for baseline environment against which the intervention variances are measured. Defaults to 1.
`verbose`	If `FALSE`, most messages are supressed.

A list with elements

`Ahat`	The connectivity matrix where entry (i,j) is the effect pointing from variable i to variable j.
`AhatAdjacency`	If `ev`>0, the connectivity matrix retained by stability selection. Entries give the rounded percentage of times the edge has been retained (and 0 if below the critical threshold).
`varianceEnv`	The estimated interventions variances up to an offset. `varianceEnv` is a (Gxp)-dimensional matrix where G is the number of unique environments. The ij-th entry contains the difference between the estimated intervention variance of variable j in environment i and the estimated intervention variance of variable j in the base setting (given by input parameter `baseSettingEnv`).

Christina Heinze-Deml <heinzedeml@stat.math.ethz.ch>

Dominik Rothenhaeusler, Christina Heinze, Jonas Peters, Nicolai Meinshausen: backShift: Learning causal cyclic graphs from unknown shift interventions. Advances in Neural Information Processing Systems (NIPS) 28, 2015. arXiv: http://arxiv.org/abs/1506.02494

ICP and hiddenICP for reconstructing the parents of a variable under interventions on all other variables. getParents and getParentsStable from the package CompareCausalNetworks to estimate the connectivity matrix of a directed causal graph, using various possible methods (including backShift).

## Simulate data with connectivity matrix A

seed <- 1
# sample size n
n <- 10000
# 3 predictor variables
p  <- 3
A <- diag(p)*0
A[1,2] <- 0.8
A[2,3] <- -0.8
A[3,1] <- 0.8

# divide data into 10 different environments
G <- 10

# simulate
simulation.res <- simulateInterventions(
                    n, p, A, G, intervMultiplier = 2,
                    noiseMult = 1, nonGauss = FALSE,
                    fracVarInt = 0.5, hidden = TRUE,
                    knownInterventions = FALSE,
                    simulateObs = TRUE, seed)

environment <- simulation.res$environment
X <- simulation.res$X

## Compute feedback estimator with stability selection

network <- backShift(X, environment, ev = 1)

## Print point estimates and stable edges

# true connectivity matrix
print(A)
# point estimate
print(network$Ahat)
# shows empirical selection probability for stable edges
print(network$AhatAdjacency)

backShift: Percentage of runs in stability selection that converged: 100%
     [,1] [,2] [,3]
[1,]  0.0  0.8  0.0
[2,]  0.0  0.0 -0.8
[3,]  0.8  0.0  0.0
             [,1]         [,2]        [,3]
[1,]  0.000000000  0.794790497 -0.05313948
[2,] -0.003423618  0.000000000 -0.83211232
[3,]  0.797009670 -0.004563331  0.00000000
     [,1] [,2] [,3]
[1,]    0  100    0
[2,]    0    0  100
[3,]  100    0    0