knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The R package R6causal
implements an R6 class called SCM
. The class aims to simplify working with structural causal models. The missing data mechanism can be defined as a part of the structural model.
The class contains methods for
igraph
or qgraph
causaleffect
and dosearch
dosearch
cfid
In addition, there are functions for
The class ParallelWorld
inherits SCM
and defines a structural causal model that describes parallel worlds for counterfactual inference.
The class LinearGaussianSCM
inherits SCM
and defines a structural causal model where all functions are linear and all background variables follow Gaussian distribution.
library(R6causal) library(data.table) library(stats) data.table::setDTthreads(2)
Structural causal model (SCM) for a backdoor situation can be defined as follows
backdoor <- SCM$new("backdoor", uflist = list( uz = function(n) {return(runif(n))}, ux = function(n) {return(runif(n))}, uy = function(n) {return(runif(n))} ), vflist = list( z = function(uz) { return(as.numeric(uz < 0.4))}, x = function(ux, z) { return(as.numeric(ux < 0.2 + 0.5*z))}, y = function(uy, z, x) { return(as.numeric(uy < 0.1 + 0.4*z + 0.4*x))} ) )
A shortcut notation for this is
backdoor_text <- SCM$new("backdoor", uflist = list( uz = "n : runif(n)", ux = "n : runif(n)", uy = "n : runif(n)" ), vflist = list( z = "uz : as.numeric(uz < 0.4)", x = "ux, z : as.numeric(ux < 0.2 + 0.5*z)", y = "uy, z, x : as.numeric(uy < 0.1 + 0.4*z + 0.4*x)" ) )
Alternatively the functions of SCM can be specified via conditional probability tables
backdoor_condprob <- SCM$new("backdoor", uflist = list( uz = function(n) {return(runif(n))}, ux = function(n) {return(runif(n))}, uy = function(n) {return(runif(n))} ), vflist = list( z = function(uz) { return( generate_condprob( ycondx = data.table(z = c(0,1), prob = c(0.6,0.4)), x = data.table(uz = uz), Umerge_expr = "uz"))}, x = function(ux, z) { return( generate_condprob( ycondx = data.table(x = c(0,1,0,1), z = c(0,0,1,1), prob = c(0.8,0.2,0.3,0.7)), x = data.table(z = z, ux = ux), Umerge_expr = "ux"))}, y = function(uy, z, x) { return( generate_condprob( ycondx = data.table(y= rep(c(0,1), 4), z = c(0,0,1,1,0,0,1,1), x = c(0,0,0,0,1,1,1,1), prob = c(0.9,0.1,0.5,0.5, 0.5,0.5,0.1,0.9)), x = data.table(z = z, x = x, uy = uy), Umerge_expr = "uy"))} ) )
It is possible to mix the styles and define some elements of a function list as functions, some as text and some as conditional probability tables.
A linear Gaussian SCM can be defined giving the coefficients for the structural equations:
lgbackdoor <- LinearGaussianSCM$new("Linear Gaussian Backdoor", linear_gaussian = list( uflist = list(ux = function(n) {rnorm(n)}, uy = function(n) {rnorm(n)}, uz = function(n) {rnorm(n)}), vnames = c("x","y","z"), vcoefmatrix = matrix(c(0,0.4,0,0,0,0,0.6,0.8,0),3,3), ucoefvector = c(1,1,1), ccoefvector = c(0,0,0))) print(lgbackdoor)
It is also possible to generate the underlying DAG and the coefficients randomly:
randomlg <- LinearGaussianSCM$new("Random Linear Gaussian", random_linear_gaussian = list( nv = 6, edgeprob=0.5, vcoefdistr = function(n) {rnorm(n)}, ccoefdistr = function(n) {rnorm(n)}, ucoefdistr = function(n) {rnorm(n)})) print(randomlg)
\newpage
The print method presents the basic information on the model
backdoor
\newpage
The plotting method of the package igraph
is used by default. If qgraph
is available, its plotting method can be used as well. The argument subset
controls which variables are plotted. Plotting parameters are passed to the plotting method.
backdoor$plot(vertex.size = 25) # with package 'igraph' backdoor$plot(subset = "v") # only observed variables if (requireNamespace("qgraph", quietly = TRUE)) backdoor$plot(method = "qgraph") # alternative look with package 'qgraph'
Calling method simulate()
creates or updates data table simdata
.
backdoor$simulate(10) backdoor$simdata backdoor$simulate(8) backdoor$simdata backdoor_text$simulate(20) backdoor_condprob$simulate(30)
In an intervention, the structural equation of the target variable is changed.
backdoor_x1 <- backdoor$clone() # making a copy backdoor_x1$intervene("x",1) # applying the intervention backdoor_x1$plot(method = "qgraph") # to see that arrows incoming to x are cut backdoor_x1$simulate(10) # simulating from the intervened model backdoor_x1$simdata
backdoor_yz <- backdoor$clone() # making a copy backdoor_yz$intervene("y", function(uy, z) {return(as.numeric(uy < 0.1 + 0.8*z ))}) # making y a function of z only backdoor_yz$plot(method = "qgraph") # to see that arrow x -> y is cut
The function run_experiment
applies a set of interventions, simulates data and collects the results.
backdoor_experiment <- run_experiment(backdoor, intervene = list(x = c(0,1)), response = "y", n = 10000) str(backdoor_experiment) colMeans(backdoor_experiment$response_list$y)
There are direct plugins to R packages causaleffect
, dosearch
and cfid
that can be used to solve identifiability problems.
backdoor$causal.effect(y = "y", x = "x") backdoor$dosearch(data = "p(x,y,z)", query = "p(y|do(x))") backdoor$cfid(gamma = cfid::conj(cfid::cf("Y",0), cfid::cf("X",0, c(Z=1))) )
Let us assume that intervention do(X=0) was applied and the response Y = 0 was recorded. What is the probability that in this situation the intervention do(X=1) would have led to the response Y = 1? We estimate this probability by means of simulation.
cfdata <- counterfactual(backdoor, situation = list(do = list(target = "x", ifunction = 0), condition = data.table( x = 0, y = 0)), target = "x", ifunction = 1, n = 100000, method = "rejection") mean(cfdata$y)
The result differs from P(Y = 1 | do(X = 1))
backdoor_x1$simulate(100000) mean(backdoor_x1$simdata$y)
Parallel world graphs (a generalization of a twin graph) are used for counterfactual inference with several counterfactual interventions . The package implements class ParallelWorld
which heritates class SCM
. A ParallelWorld
object is created from an SCM
object by specifying the interventions for each world. By default the variables of the parallel worlds are named with suffixes "_1", "_2", ...
In the example below, we have the original world (variables x
, z
, y
) and its two variants. In the variant 1 (variables x_1
, z_1
, y_1
), the value of x
(variable x_1
in the object) is set to be 0. In the variant 2 (variables x_2
, z_2
, y_2
), the value of x
(variable x_2
in the object) is set to be 0 and the value of z
(variable z_2
in the object) is set to be 1.
backdoor_parallel <- ParallelWorld$new( backdoor, dolist=list( list(target = "x", ifunction = 0), list(target = list("z","x"), ifunction = list(1,0)) ) ) backdoor_parallel if (requireNamespace("qgraph", quietly = TRUE)) backdoor_parallel$plot(method = "qgraph")
Counterfactual data can be simulated with function counterfactual
. In the example below, we know that variable y
obtained value 0 in the original world as well as variants 1 and 2. We are interested in the counterfactual distribution of y
if x
had been set to 1.
cfdata <- counterfactual(backdoor_parallel, situation = list( do = NULL, condition = data.table::data.table( y = 0, y_1 = 0, y_2 = 0)), target = "x", ifunction = 1, n = 100000, method = "rejection") mean(cfdata$y)
The printed value is a simulation based estimate for the counterfactual probability $P(Y=1)$.
An alternative way for answering the same question defines the case of interest as one of the parallel worlds (here variant 3).
backdoor_parallel2 <- ParallelWorld$new( backdoor, dolist=list( list(target = "x", ifunction = 0), list(target = list("z","x"), ifunction = list(1,0)), list(target = "x", ifunction = 1) ) ) cfdata <- counterfactual(backdoor_parallel2, situation = list( do = NULL, condition = data.table::data.table( y = 0, y_1 = 0, y_2 = 0)), n = 100000, method = "rejection") mean(cfdata$y_3)
The printed value is a simulation based estimate for the counterfactual probability $P(Y=1)$.
The missing data mechanism is defined in similar manner as the other variables.
backdoor_md <- SCM$new("backdoor_md", uflist = list( uz = "n : runif(n)", ux = "n : runif(n)", uy = "n : runif(n)", urz = "n : runif(n)", urx = "n : runif(n)", ury = "n : runif(n)" ), vflist = list( z = "uz : as.numeric(uz < 0.4)", x = "ux, z : as.numeric(ux < 0.2 + 0.5*z)", y = "uy, z, x : as.numeric(uy < 0.1 + 0.4*z + 0.4*x)" ), rflist = list( z = "urz : as.numeric( urz < 0.9)", x = "urx, z : as.numeric( (urx + z)/2 < 0.9)", y = "ury, z : as.numeric( (ury + z)/2 < 0.9)" ), rprefix = "r_" )
backdoor_md$plot(vertex.size = 25, edge.arrow.size=0.5) # with package 'igraph' backdoor_md$plot(subset = "v") # only observed variables a if (!requireNamespace("qgraph", quietly = TRUE)) backdoor_md$plot(method = "qgraph") # alternative look with package 'qgraph'
By default both complete data and incomplete data are simulated. The incomplete dataset is named as $simdata_obs
.
backdoor_md$simulate(100) summary(backdoor_md$simdata) summary(backdoor_md$simdata_obs)
By using the argument fixedvars
one can keep the complete data unchanged and re-simulate the missing data mechanism.
backdoor_md$simulate(100, fixedvars = c("x","y","z","ux","uy","uz")) summary(backdoor_md$simdata) summary(backdoor_md$simdata_obs)
backdoor_md$dosearch(data = "p(x*,y*,z*,r_x,r_y,r_z)", query = "p(y|do(x))")
It is automatically recognized that the problem is a missing data problem when rflist != NULL
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.