Using R6causal"
In R6causal: R6 Class for Structural Causal Models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

The R package R6causal implements an R6 class called SCM. The class aims to simplify working with structural causal models. The missing data mechanism can be defined as a part of the structural model.

The class contains methods for

defining a structural causal model via functions, text or conditional probability tables
printing basic information on the model
plotting the graph for the model using packages igraph or qgraph
simulating data from the model
applying an intervention
checking the identifiability of a query using the R packages causaleffect and dosearch
defining the missing data mechanism
simulating incomplete data from the model according to the specified missing data mechanism
checking the identifiability in a missing data problem using the R package dosearch
checking the identifiability of a counterfactual query using the R package cfid

In addition, there are functions for

running experiments
counterfactual inference using simulation
evaluating fairness of a prediction model

The class ParallelWorld inherits SCM and defines a structural causal model that describes parallel worlds for counterfactual inference.

The class LinearGaussianSCM inherits SCM and defines a structural causal model where all functions are linear and all background variables follow Gaussian distribution.

Setup

library(R6causal)
library(data.table)
library(stats)
data.table::setDTthreads(2)

Defining the model

Structural causal model (SCM) for a backdoor situation can be defined as follows

backdoor <- SCM$new("backdoor",
  uflist = list(
    uz = function(n) {return(runif(n))},
    ux = function(n) {return(runif(n))},
    uy = function(n) {return(runif(n))}
  ),
  vflist = list(
    z = function(uz) {
      return(as.numeric(uz < 0.4))},
    x = function(ux, z) {
      return(as.numeric(ux < 0.2 + 0.5*z))},
    y = function(uy, z, x) {
      return(as.numeric(uy < 0.1 + 0.4*z + 0.4*x))}
  )
)

A shortcut notation for this is

backdoor_text <- SCM$new("backdoor",
  uflist = list(
    uz = "n : runif(n)", 
    ux = "n : runif(n)", 
    uy = "n : runif(n)" 
  ),
  vflist = list(
    z = "uz : as.numeric(uz < 0.4)", 
    x = "ux, z : as.numeric(ux < 0.2 + 0.5*z)",
    y = "uy, z, x : as.numeric(uy < 0.1 + 0.4*z + 0.4*x)" 
  )
)

Alternatively the functions of SCM can be specified via conditional probability tables

backdoor_condprob <- SCM$new("backdoor",
  uflist = list(
    uz = function(n) {return(runif(n))},
    ux = function(n) {return(runif(n))},
    uy = function(n) {return(runif(n))}
  ),
  vflist = list(
    z = function(uz) {
      return( generate_condprob( ycondx = data.table(z = c(0,1), 
                                                     prob = c(0.6,0.4)), 
                               x = data.table(uz = uz), 
                               Umerge_expr = "uz"))},
    x = function(ux, z) {
      return( generate_condprob( ycondx = data.table(x = c(0,1,0,1), 
                                                     z = c(0,0,1,1), 
                                                     prob = c(0.8,0.2,0.3,0.7)),
                                             x = data.table(z = z, ux = ux), 
                                             Umerge_expr = "ux"))},
    y = function(uy, z, x) {
      return( generate_condprob( ycondx = data.table(y= rep(c(0,1), 4), 
                                                     z = c(0,0,1,1,0,0,1,1), 
                                                     x = c(0,0,0,0,1,1,1,1), 
                                                     prob = c(0.9,0.1,0.5,0.5,
                                                              0.5,0.5,0.1,0.9)),
                                             x = data.table(z = z, x = x, uy = uy), 
                                             Umerge_expr = "uy"))}
  )
)

It is possible to mix the styles and define some elements of a function list as functions, some as text and some as conditional probability tables.

Defining a linear Gaussian SCM

A linear Gaussian SCM can be defined giving the coefficients for the structural equations:

lgbackdoor <- LinearGaussianSCM$new("Linear Gaussian Backdoor",
                                    linear_gaussian = list(
                                      uflist = list(ux = function(n) {rnorm(n)},
                                                    uy = function(n) {rnorm(n)},
                                                    uz = function(n) {rnorm(n)}),
                                      vnames = c("x","y","z"),
                                      vcoefmatrix = matrix(c(0,0.4,0,0,0,0,0.6,0.8,0),3,3),
                                      ucoefvector = c(1,1,1),
                                      ccoefvector = c(0,0,0)))
print(lgbackdoor)

It is also possible to generate the underlying DAG and the coefficients randomly:

randomlg <- LinearGaussianSCM$new("Random Linear Gaussian",
                                  random_linear_gaussian = list(
                                  nv = 6, 
                                  edgeprob=0.5, 
                                  vcoefdistr = function(n) {rnorm(n)}, 
                                  ccoefdistr = function(n) {rnorm(n)}, 
                                  ucoefdistr = function(n) {rnorm(n)}))
print(randomlg)

\newpage

Printing the model

The print method presents the basic information on the model

backdoor

\newpage

Plotting the graph

The plotting method of the package igraph is used by default. If qgraph is available, its plotting method can be used as well. The argument subset controls which variables are plotted. Plotting parameters are passed to the plotting method.

backdoor$plot(vertex.size = 25) # with package 'igraph'
backdoor$plot(subset = "v") # only observed variables
if (requireNamespace("qgraph", quietly = TRUE)) backdoor$plot(method = "qgraph") 
# alternative look with package 'qgraph'

Simulating data

Calling method simulate() creates or updates data table simdata.

backdoor$simulate(10)
backdoor$simdata
backdoor$simulate(8)
backdoor$simdata
backdoor_text$simulate(20)
backdoor_condprob$simulate(30)

Applying an intervention

In an intervention, the structural equation of the target variable is changed.

backdoor_x1 <- backdoor$clone()  # making a copy
backdoor_x1$intervene("x",1) # applying the intervention
backdoor_x1$plot(method = "qgraph") # to see that arrows incoming to x are cut
backdoor_x1$simulate(10) # simulating from the intervened model
backdoor_x1$simdata

An intervention can redefine a structural equation

backdoor_yz <- backdoor$clone()  # making a copy
backdoor_yz$intervene("y", 
  function(uy, z) {return(as.numeric(uy < 0.1 + 0.8*z ))}) # making y a function of z only
backdoor_yz$plot(method = "qgraph") # to see that arrow x -> y is cut

Running an experiment (set of interventions)

The function run_experiment applies a set of interventions, simulates data and collects the results.

backdoor_experiment <- run_experiment(backdoor, 
                                      intervene = list(x = c(0,1)), 
                                      response = "y", 
                                      n = 10000)
str(backdoor_experiment)
colMeans(backdoor_experiment$response_list$y)

Applying the ID algorithm, Do-search and cfid

There are direct plugins to R packages causaleffect, dosearch and cfid that can be used to solve identifiability problems.

backdoor$causal.effect(y = "y", x = "x")
backdoor$dosearch(data = "p(x,y,z)", query = "p(y|do(x))")
backdoor$cfid(gamma = cfid::conj(cfid::cf("Y",0), cfid::cf("X",0, c(Z=1))) )

Counterfactual inference (a simple case)

Let us assume that intervention do(X=0) was applied and the response Y = 0 was recorded. What is the probability that in this situation the intervention do(X=1) would have led to the response Y = 1? We estimate this probability by means of simulation.

cfdata <- counterfactual(backdoor, situation = list(do = list(target = "x", ifunction = 0), 
                                                    condition = data.table( x = 0, y = 0)), 
                         target = "x", ifunction = 1, n = 100000, 
                         method = "rejection")
mean(cfdata$y)

The result differs from P(Y = 1 | do(X = 1))

backdoor_x1$simulate(100000)
mean(backdoor_x1$simdata$y)

Counterfactual inference (parallel worlds)

Parallel world graphs (a generalization of a twin graph) are used for counterfactual inference with several counterfactual interventions . The package implements class ParallelWorld which heritates class SCM. A ParallelWorld object is created from an SCM object by specifying the interventions for each world. By default the variables of the parallel worlds are named with suffixes "_1", "_2", ...

In the example below, we have the original world (variables x, z, y) and its two variants. In the variant 1 (variables x_1, z_1, y_1), the value of x (variable x_1 in the object) is set to be 0. In the variant 2 (variables x_2, z_2, y_2), the value of x (variable x_2 in the object) is set to be 0 and the value of z (variable z_2 in the object) is set to be 1.

backdoor_parallel <- ParallelWorld$new(
                         backdoor,
                        dolist=list(
                             list(target = "x", 
                                 ifunction = 0),
                             list(target = list("z","x"), 
                                  ifunction = list(1,0))
                         )
 )
backdoor_parallel
if (requireNamespace("qgraph", quietly = TRUE)) backdoor_parallel$plot(method = "qgraph")

Counterfactual data can be simulated with function counterfactual. In the example below, we know that variable y obtained value 0 in the original world as well as variants 1 and 2. We are interested in the counterfactual distribution of y if x had been set to 1.

cfdata <- counterfactual(backdoor_parallel,
                         situation = list(
                            do = NULL,
                            condition = data.table::data.table( y = 0, y_1 = 0, y_2 = 0)),
                         target = "x",
                         ifunction = 1,
                         n = 100000,
                         method = "rejection")
mean(cfdata$y)

The printed value is a simulation based estimate for the counterfactual probability $P(Y=1)$.

An alternative way for answering the same question defines the case of interest as one of the parallel worlds (here variant 3).

backdoor_parallel2 <- ParallelWorld$new(
                         backdoor,
                        dolist=list(
                             list(target = "x", 
                                 ifunction = 0),
                             list(target = list("z","x"), 
                                  ifunction = list(1,0)),
                              list(target = "x", 
                                 ifunction = 1)
                         )
 )
cfdata <- counterfactual(backdoor_parallel2,
                         situation = list(
                            do = NULL,
                            condition = data.table::data.table( y = 0, y_1 = 0, y_2 = 0)),
                         n = 100000, 
                         method = "rejection")
mean(cfdata$y_3)

The printed value is a simulation based estimate for the counterfactual probability $P(Y=1)$.

A model with a missing data mechanism

The missing data mechanism is defined in similar manner as the other variables.

backdoor_md <- SCM$new("backdoor_md",
                       uflist = list(
                         uz = "n : runif(n)",
                         ux = "n : runif(n)",
                         uy = "n : runif(n)",
                         urz = "n : runif(n)",
                         urx = "n : runif(n)",
                         ury = "n : runif(n)"
                       ),
                       vflist = list(
                         z = "uz : as.numeric(uz < 0.4)",
                         x = "ux, z : as.numeric(ux < 0.2 + 0.5*z)",
                         y = "uy, z, x : as.numeric(uy < 0.1 + 0.4*z + 0.4*x)"
                       ),
                       rflist = list(
                         z = "urz : as.numeric( urz < 0.9)",
                         x = "urx, z : as.numeric( (urx + z)/2 < 0.9)",
                         y = "ury, z : as.numeric( (ury + z)/2 < 0.9)"
                       ),
                       rprefix = "r_"
)

Plotting the graph for a model with missing data mechanism

backdoor_md$plot(vertex.size = 25, edge.arrow.size=0.5) # with package 'igraph'
backdoor_md$plot(subset = "v") # only observed variables a
if (!requireNamespace("qgraph", quietly = TRUE)) backdoor_md$plot(method = "qgraph") 
# alternative look with package 'qgraph'

Simulating incomplete data

By default both complete data and incomplete data are simulated. The incomplete dataset is named as $simdata_obs.

backdoor_md$simulate(100)
summary(backdoor_md$simdata)
summary(backdoor_md$simdata_obs)

By using the argument fixedvars one can keep the complete data unchanged and re-simulate the missing data mechanism.

backdoor_md$simulate(100, fixedvars = c("x","y","z","ux","uy","uz"))
summary(backdoor_md$simdata)
summary(backdoor_md$simdata_obs)

Applying Do-search to a missing data problem

backdoor_md$dosearch(data = "p(x*,y*,z*,r_x,r_y,r_z)", query = "p(y|do(x))")

It is automatically recognized that the problem is a missing data problem when rflist != NULL.

Any scripts or data that you put into this service are public.

R6causal documentation built on May 29, 2024, 4:46 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

R6causal
R6 Class for Structural Causal Models

Using R6causal"
In R6causal: R6 Class for Structural Causal Models

Overview

Setup

Defining the model

Defining a linear Gaussian SCM

Printing the model

Plotting the graph

Simulating data

Applying an intervention

An intervention can redefine a structural equation

Running an experiment (set of interventions)

Applying the ID algorithm, Do-search and cfid

Counterfactual inference (a simple case)

Counterfactual inference (parallel worlds)

A model with a missing data mechanism

Plotting the graph for a model with missing data mechanism

Simulating incomplete data

Applying Do-search to a missing data problem

Try the R6causal package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

R6causal R6 Class for Structural Causal Models

Using R6causal" In R6causal: R6 Class for Structural Causal Models

Overview

Setup

Defining the model

Defining a linear Gaussian SCM

Printing the model

Plotting the graph

Simulating data

Applying an intervention

An intervention can redefine a structural equation

Running an experiment (set of interventions)

Applying the ID algorithm, Do-search and cfid

Counterfactual inference (a simple case)

Counterfactual inference (parallel worlds)

A model with a missing data mechanism

Plotting the graph for a model with missing data mechanism

Simulating incomplete data

Applying Do-search to a missing data problem

Try the R6causal package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

R6causal
R6 Class for Structural Causal Models

Using R6causal"
In R6causal: R6 Class for Structural Causal Models