Using R6causal"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

The R package R6causal implements an R6 class called SCM. The class aims to simplify working with structural causal models. The missing data mechanism can be defined as a part of the structural model.

The class contains methods for

In addition, there are functions for

The class ParallelWorld inherits SCM and defines a structural causal model that describes parallel worlds for counterfactual inference.

The class LinearGaussianSCM inherits SCM and defines a structural causal model where all functions are linear and all background variables follow Gaussian distribution.

Setup

library(R6causal)
library(data.table)
library(stats)
data.table::setDTthreads(2)

Defining the model

Structural causal model (SCM) for a backdoor situation can be defined as follows

backdoor <- SCM$new("backdoor",
  uflist = list(
    uz = function(n) {return(runif(n))},
    ux = function(n) {return(runif(n))},
    uy = function(n) {return(runif(n))}
  ),
  vflist = list(
    z = function(uz) {
      return(as.numeric(uz < 0.4))},
    x = function(ux, z) {
      return(as.numeric(ux < 0.2 + 0.5*z))},
    y = function(uy, z, x) {
      return(as.numeric(uy < 0.1 + 0.4*z + 0.4*x))}
  )
)

A shortcut notation for this is

backdoor_text <- SCM$new("backdoor",
  uflist = list(
    uz = "n : runif(n)", 
    ux = "n : runif(n)", 
    uy = "n : runif(n)" 
  ),
  vflist = list(
    z = "uz : as.numeric(uz < 0.4)", 
    x = "ux, z : as.numeric(ux < 0.2 + 0.5*z)",
    y = "uy, z, x : as.numeric(uy < 0.1 + 0.4*z + 0.4*x)" 
  )
)

Alternatively the functions of SCM can be specified via conditional probability tables

backdoor_condprob <- SCM$new("backdoor",
  uflist = list(
    uz = function(n) {return(runif(n))},
    ux = function(n) {return(runif(n))},
    uy = function(n) {return(runif(n))}
  ),
  vflist = list(
    z = function(uz) {
      return( generate_condprob( ycondx = data.table(z = c(0,1), 
                                                     prob = c(0.6,0.4)), 
                               x = data.table(uz = uz), 
                               Umerge_expr = "uz"))},
    x = function(ux, z) {
      return( generate_condprob( ycondx = data.table(x = c(0,1,0,1), 
                                                     z = c(0,0,1,1), 
                                                     prob = c(0.8,0.2,0.3,0.7)),
                                             x = data.table(z = z, ux = ux), 
                                             Umerge_expr = "ux"))},
    y = function(uy, z, x) {
      return( generate_condprob( ycondx = data.table(y= rep(c(0,1), 4), 
                                                     z = c(0,0,1,1,0,0,1,1), 
                                                     x = c(0,0,0,0,1,1,1,1), 
                                                     prob = c(0.9,0.1,0.5,0.5,
                                                              0.5,0.5,0.1,0.9)),
                                             x = data.table(z = z, x = x, uy = uy), 
                                             Umerge_expr = "uy"))}
  )
)

It is possible to mix the styles and define some elements of a function list as functions, some as text and some as conditional probability tables.

Defining a linear Gaussian SCM

A linear Gaussian SCM can be defined giving the coefficients for the structural equations:

lgbackdoor <- LinearGaussianSCM$new("Linear Gaussian Backdoor",
                                    linear_gaussian = list(
                                      uflist = list(ux = function(n) {rnorm(n)},
                                                    uy = function(n) {rnorm(n)},
                                                    uz = function(n) {rnorm(n)}),
                                      vnames = c("x","y","z"),
                                      vcoefmatrix = matrix(c(0,0.4,0,0,0,0,0.6,0.8,0),3,3),
                                      ucoefvector = c(1,1,1),
                                      ccoefvector = c(0,0,0)))
print(lgbackdoor)

It is also possible to generate the underlying DAG and the coefficients randomly:

randomlg <- LinearGaussianSCM$new("Random Linear Gaussian",
                                  random_linear_gaussian = list(
                                  nv = 6, 
                                  edgeprob=0.5, 
                                  vcoefdistr = function(n) {rnorm(n)}, 
                                  ccoefdistr = function(n) {rnorm(n)}, 
                                  ucoefdistr = function(n) {rnorm(n)}))
print(randomlg)

\newpage

Printing the model

The print method presents the basic information on the model

backdoor

\newpage

Plotting the graph

The plotting method of the package igraph is used by default. If qgraph is available, its plotting method can be used as well. The argument subset controls which variables are plotted. Plotting parameters are passed to the plotting method.

backdoor$plot(vertex.size = 25) # with package 'igraph'
backdoor$plot(subset = "v") # only observed variables
if (requireNamespace("qgraph", quietly = TRUE)) backdoor$plot(method = "qgraph") 
# alternative look with package 'qgraph'

Simulating data

Calling method simulate() creates or updates data table simdata.

backdoor$simulate(10)
backdoor$simdata
backdoor$simulate(8)
backdoor$simdata
backdoor_text$simulate(20)
backdoor_condprob$simulate(30)

Applying an intervention

In an intervention, the structural equation of the target variable is changed.

backdoor_x1 <- backdoor$clone()  # making a copy
backdoor_x1$intervene("x",1) # applying the intervention
backdoor_x1$plot(method = "qgraph") # to see that arrows incoming to x are cut
backdoor_x1$simulate(10) # simulating from the intervened model
backdoor_x1$simdata

An intervention can redefine a structural equation

backdoor_yz <- backdoor$clone()  # making a copy
backdoor_yz$intervene("y", 
  function(uy, z) {return(as.numeric(uy < 0.1 + 0.8*z ))}) # making y a function of z only
backdoor_yz$plot(method = "qgraph") # to see that arrow x -> y is cut

Running an experiment (set of interventions)

The function run_experiment applies a set of interventions, simulates data and collects the results.

backdoor_experiment <- run_experiment(backdoor, 
                                      intervene = list(x = c(0,1)), 
                                      response = "y", 
                                      n = 10000)
str(backdoor_experiment)
colMeans(backdoor_experiment$response_list$y)

Applying the ID algorithm, Do-search and cfid

There are direct plugins to R packages causaleffect, dosearch and cfid that can be used to solve identifiability problems.

backdoor$causal.effect(y = "y", x = "x")
backdoor$dosearch(data = "p(x,y,z)", query = "p(y|do(x))")
backdoor$cfid(gamma = cfid::conj(cfid::cf("Y",0), cfid::cf("X",0, c(Z=1))) ) 

Counterfactual inference (a simple case)

Let us assume that intervention do(X=0) was applied and the response Y = 0 was recorded. What is the probability that in this situation the intervention do(X=1) would have led to the response Y = 1? We estimate this probability by means of simulation.

cfdata <- counterfactual(backdoor, situation = list(do = list(target = "x", ifunction = 0), 
                                                    condition = data.table( x = 0, y = 0)), 
                         target = "x", ifunction = 1, n = 100000, 
                         method = "rejection")
mean(cfdata$y)

The result differs from P(Y = 1 | do(X = 1))

backdoor_x1$simulate(100000)
mean(backdoor_x1$simdata$y)

Counterfactual inference (parallel worlds)

Parallel world graphs (a generalization of a twin graph) are used for counterfactual inference with several counterfactual interventions . The package implements class ParallelWorld which heritates class SCM. A ParallelWorld object is created from an SCM object by specifying the interventions for each world. By default the variables of the parallel worlds are named with suffixes "_1", "_2", ...

In the example below, we have the original world (variables x, z, y) and its two variants. In the variant 1 (variables x_1, z_1, y_1), the value of x (variable x_1 in the object) is set to be 0. In the variant 2 (variables x_2, z_2, y_2), the value of x (variable x_2 in the object) is set to be 0 and the value of z (variable z_2 in the object) is set to be 1.

backdoor_parallel <- ParallelWorld$new(
                         backdoor,
                        dolist=list(
                             list(target = "x", 
                                 ifunction = 0),
                             list(target = list("z","x"), 
                                  ifunction = list(1,0))
                         )
 )
backdoor_parallel
if (requireNamespace("qgraph", quietly = TRUE)) backdoor_parallel$plot(method = "qgraph") 

Counterfactual data can be simulated with function counterfactual. In the example below, we know that variable y obtained value 0 in the original world as well as variants 1 and 2. We are interested in the counterfactual distribution of y if x had been set to 1.

cfdata <- counterfactual(backdoor_parallel,
                         situation = list(
                            do = NULL,
                            condition = data.table::data.table( y = 0, y_1 = 0, y_2 = 0)),
                         target = "x",
                         ifunction = 1,
                         n = 100000,
                         method = "rejection")
mean(cfdata$y)

The printed value is a simulation based estimate for the counterfactual probability $P(Y=1)$.

An alternative way for answering the same question defines the case of interest as one of the parallel worlds (here variant 3).

backdoor_parallel2 <- ParallelWorld$new(
                         backdoor,
                        dolist=list(
                             list(target = "x", 
                                 ifunction = 0),
                             list(target = list("z","x"), 
                                  ifunction = list(1,0)),
                              list(target = "x", 
                                 ifunction = 1)
                         )
 )
cfdata <- counterfactual(backdoor_parallel2,
                         situation = list(
                            do = NULL,
                            condition = data.table::data.table( y = 0, y_1 = 0, y_2 = 0)),
                         n = 100000, 
                         method = "rejection")
mean(cfdata$y_3)

The printed value is a simulation based estimate for the counterfactual probability $P(Y=1)$.

A model with a missing data mechanism

The missing data mechanism is defined in similar manner as the other variables.

backdoor_md <- SCM$new("backdoor_md",
                       uflist = list(
                         uz = "n : runif(n)",
                         ux = "n : runif(n)",
                         uy = "n : runif(n)",
                         urz = "n : runif(n)",
                         urx = "n : runif(n)",
                         ury = "n : runif(n)"
                       ),
                       vflist = list(
                         z = "uz : as.numeric(uz < 0.4)",
                         x = "ux, z : as.numeric(ux < 0.2 + 0.5*z)",
                         y = "uy, z, x : as.numeric(uy < 0.1 + 0.4*z + 0.4*x)"
                       ),
                       rflist = list(
                         z = "urz : as.numeric( urz < 0.9)",
                         x = "urx, z : as.numeric( (urx + z)/2 < 0.9)",
                         y = "ury, z : as.numeric( (ury + z)/2 < 0.9)"
                       ),
                       rprefix = "r_"
)

Plotting the graph for a model with missing data mechanism

backdoor_md$plot(vertex.size = 25, edge.arrow.size=0.5) # with package 'igraph'
backdoor_md$plot(subset = "v") # only observed variables a
if (!requireNamespace("qgraph", quietly = TRUE)) backdoor_md$plot(method = "qgraph") 
# alternative look with package 'qgraph'

Simulating incomplete data

By default both complete data and incomplete data are simulated. The incomplete dataset is named as $simdata_obs.

backdoor_md$simulate(100)
summary(backdoor_md$simdata)
summary(backdoor_md$simdata_obs)

By using the argument fixedvars one can keep the complete data unchanged and re-simulate the missing data mechanism.

backdoor_md$simulate(100, fixedvars = c("x","y","z","ux","uy","uz"))
summary(backdoor_md$simdata)
summary(backdoor_md$simdata_obs)

Applying Do-search to a missing data problem

backdoor_md$dosearch(data = "p(x*,y*,z*,r_x,r_y,r_z)", query = "p(y|do(x))")

It is automatically recognized that the problem is a missing data problem when rflist != NULL.



Try the R6causal package in your browser

Any scripts or data that you put into this service are public.

R6causal documentation built on May 29, 2024, 4:46 a.m.