knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of causalXtreme is to provide an interface to perform causal discovery in linear structural equation models (SEM) with heavy-tailed noise. For more details see @gne2019 [https://arxiv.org/abs/1908.05097].
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("nicolagnecco/causalXtreme")
Let us generate 500 observations from a SEM of two Student-t variables, $X_1$ and $X_2$, with 1.5 degrees of freedom (i.e., heavy-tailed).
When the function simulate_data
is called with the default values, it returns a list containing:
dataset
represented as a matrix of size $n \times p$. Here, $n = 500$ is the number of observations and $p = 2$ is the number of variables,dag
of size $p\times p$.library(causalXtreme) # basic example code set.seed(1) sem <- simulate_data(n = 500, p = 2, prob_connect = 0.5, distr = "student_t", tail_index = 1.5)
At this point, we can look at the randomly simulated DAG.
sem$dag
We interpret the adjacency matrix as follows. Loosely speaking, we say that variable $X_i$ causes variable $X_j$ if the entry $(i, j)$ of the adjacency matrix is equal to 1.
We see that the first variable $X_1$ causes the second variable $X_2$, since the entry $(1, 2)$ of the matrix sem$dag
is equal to 1. We can plot the simulated dataset.
plot(sem$dataset, pch = 20, xlab = "X1", ylab = "X2")
At this point, we can estimate the causal direction between $X_1$ and $X_2$ by computing the causal tail coefficients $\Gamma_{12}$ and $\Gamma_{21}$ [see @gne2019, Definition 1].
X1 <- sem$dataset[, 1] X2 <- sem$dataset[, 2] # gamma_12 causal_tail_coeff(X1, X2) # gamma_21 causal_tail_coeff(X2, X1)
We see that the coefficient $\Gamma_{12} \approx 1$ (entry $(1, 2)$ of the matrix) and $\Gamma_{21} < 1$ (entry $(2, 1)$ of the matrix). This is evidence for a causal relationship from $X_1$ to $X_2$.
We can also run the extremal ancestral search (EASE) algorithm, based on the causal tail coefficients [see @gne2019, Section 3.1]. The algorithm estimates from the data a causal order of the DAG.
ease(dat = sem$dataset)
In this case, we can see that the estimated causal order is correct, since $X_1$ (the cause) is placed before $X_2$ (the effect).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.