In gmonette/causalsim: Covariance Matrix of a Causal Graph

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/",
  out.width = "100%"
)

options(digits = 4)

causalsim

The causalsim package uses a matrix containing the coefficients and standard deviations of the unique independent components of a linear causal DAG to generate the marginal covariance matrix and to calculate the value of coefficients of linear models applied to a population generated by the causal DAG.

Installation

You can install the development version of causalsim like so:

remotes::install.github("gmonette/causalsim)

Example

This example creates a DAG consisting of a collection of paths among the following variables in a causal analysis. The variables included here are:

x the focal predictor, e.g., a treatment variable
y the outcome
m a mediator variable
i an instrumental variable, i.e., a predictor of only x
c a covariate, i.e., a predictor of y one might also want to take into account
zr
zc a central confounder, providing a backdoor path from x through zl to y through zr
zl

The DAG to be studied here is:

In causalsim this DAG is to be setup as a square matrix, mat, whose rows and columns are the 8 variables shown in the figure. The entries are:

mat[i, j] = coefficient on the path from variable j to i
mat[i, i] = error variance associated with variable i

library(causalsim)
library(dplyr)

nams <- c('zc','zl','zr','c','x','y','m','i')
mat <- matrix(0, length(nams), length(nams))
rownames(mat) <- nams
colnames(mat) <- nams

# set up paths: each value is the regression coefficient on the path

# direct effect, x -> y
mat['y','x'] <- 3

# indirect effect, x -> m -> y
mat['m','x'] <- 1
mat['y','m'] <- 1

# Instrumental variable 
mat['x','i'] <- 2

# 'Covariate'
mat['y','c'] <- 1

# confounding back-door path
mat['zl','zc'] <- 2 
mat['zr','zc'] <- 2
mat['x','zl'] <- 1
mat['y','zr'] <- 2

# independent error
diag(mat) <- 2

mat # not in lower diagonal form

This matrix represents a DAG only if it has no cycles, which means it can be permuted to lower-diagonal form.

dag <- to_dag(mat) # can be permuted to lower-diagonal form
dag

covld() computes the overall covariance matrix generated by the coefficients of a linear DAG.

covld(dag)

Given a linear DAG, coefx() finds the population regression coefficients using data with the marginal covariance structure implied by the DAG.

The model lm(y ~ x) gives a biased estimate of $\beta_x$, whose true value is mat['y','x'] = r mat['y','x']. coefx() returns a list, but it can be coerced to a dataframe.

coefx(y ~ x, dag)                 # with confounding

# print it nicely
as.data.frame(coefx(y ~ x, dag))  # with confounding

We can examine the coefficients for any model including other variables in addition to x.

as.data.frame(coefx(y ~ x + zc, dag))              # blocking back-door path
as.data.frame(coefx(y ~ x + zr, dag))              # blocking with lower SE
as.data.frame(coefx(y ~ x + zl, dag))              # blocking with worse SE
as.data.frame(coefx(y ~ x + zr + c, dag))          # adding a 'covariate'
as.data.frame(coefx(y ~ x + zr + m, dag))          # including a mediator
as.data.frame(coefx(y ~ x + zl + i, dag))          # including an instrument
as.data.frame(coefx(y ~ x + zl + i + c, dag))      # I and C

It is more convenient to set up a collection of formulas as a list, and then run coefx on each to give a dataframe containing all results.

fmlas <- list(
  y ~ x, 
  y ~ x + zc, 
  y ~ x + zr, 
  y ~ x + zl,
  y ~ x + zr + c, 
  y ~ x + zr + m, 
  y ~ x + zl + i,
  y ~ x + zl + i + c
)

fmlas %>% 
  lapply(coefx, dag) %>% 
  lapply(as.data.frame) %>% 
  do.call(rbind.data.frame, .) -> df

df