coefx: Multiple regression coefficient from a linear DAG
In gmonette/causalsim: Covariance Matrix of a Causal Graph

View source: R/coefx.R

coefx

R Documentation

Multiple regression coefficient from a linear DAG

Description

Given a linear DAG, find the population regression coefficents using data with the marginal covariance structure implied by the linear DAG.

Usage

coefx(fmla, dag, var = covld(to_dag(dag)), iv = NULL)

Arguments

`fmla`	a linear model formula. The variables in the formula must be column names of `dag`.
`dag`	a square matrix defining a linear DAG. The column names and row names of A must be identical. The non-diagonal entries of `dag` contain the causal of coefficients of arrows pointing from the column variable to the row variable. The diagonal entries are standard deviations of the normally distributed independent component generating the row variable. A matrix defines a linear dag if the same permutation of its rows and columns can transform it into a lower diagonal matrix.
`var`	variance matrix of variables if entered directly without a dag.
`iv`	a one-sided formula with a single variable (at present) specifying a variable to be used as an instrumental variable

Value

a list with class 'coefx' containing the population coefficient for the first predictor variable in fmla, the residual standard error of the regression, the conditional standard deviation of the residual of the first predictor and the ratio of the last two quantities which constitutes the 'standard error factor' which, if multiplied by 1/sqrt(n) is an estimate of the standard error of the estimate of the regression coefficient for the first predictor variable.

The elements are:

beta: the population coefficient for the first predictor variable in the fmla
sd_e: the residual standard error of the regression
sd_x_avp: the conditional standard deviation of the residual of the first predictor
sd_betax_factor: the 'standard error factor', the ratio sd_e / sd_x_avp, which, if multiplied by 1/sqrt(n) is an estimate of the standard error of the estimate of the regression coefficient for the first predictor variable
fmla: the formula
label: a character string of the formula

Examples

library(dagitty)
nams <- c('zc','zl','zr','c','x','y','m','i')
mat <- matrix(0, length(nams), length(nams))
rownames(mat) <- nams
colnames(mat) <- nams

# confounding back-door path
mat['zl','zc'] <- 2 
mat['zr','zc'] <- 2
mat['x','zl'] <- 1
mat['y','zr'] <- 2

# direct effect
mat['y','x'] <- 3

# indirect effect
mat['m','x'] <- 1
mat['y','m'] <- 1

# Instrumental variable 
mat['x','i'] <- 2

# 'Covariate'
mat['y','c'] <- 1

# independent error
diag(mat) <- 2

mat # not in lower diagonal form   
dag <- to_dag(mat) # can be permuted to lower-diagonal form
dag

coefx(y ~ x, dag)  # with confounding
coefx(y ~ x + zc, dag)  # blocking back-door path
coefx(y ~ x + zr, dag) # blocking with lower SE
coefx(y ~ x + zl, dag) # blocking with worse SE
coefx(y ~ x + zr + c, dag)  # adding a 'covariate'
coefx(y ~ x + zr + m, dag)  # including a mediator
coefx(y ~ x + zl + i, dag)  # including an instrument
coefx(y ~ x + zl + i + c, dag) # I and C

# plotting added-variable plot ellipse 
lines(
    coefx(y ~ x + zr, mat),  
    lwd = 2, xv= 5,xlim = c(-5,10), ylim = c(-25, 50))
lines(
    coefx(y ~ x + zl, mat), new = FALSE,
    col = 'red', xv = 5, lwd = 2)
lines(
    coefx(y ~ x + i, mat), new = FALSE,
    col = 'dark green', xv = 5)

# putting results in a data frame
# for easier comparison of SEs

fmlas <- list(
  y ~ x, 
  y ~ x + zc, 
  y ~ x + zr, 
  y ~ x + zl,
  y ~ x + zr + c, 
  y ~ x + zr + m, 
  y ~ x + zl + i,
  y ~ x + zl + i + c
)
res <- lapply(fmlas, coefx, dag)
res <- lapply(res, function(ll) {
    ll$fmla <- paste(as.character(ll$fmla)[c(2,1,3)], collapse = ' ')
    ll$beta <- ll$beta[1]
    ll
})

df <- do.call(rbind.data.frame, res)
df 

# simulation

head(sim(dag, 100))
var(sim(dag, 10000)) - covld(dag)

# plotting

plot(dag) + ggdag::theme_dag()

gmonette/causalsim documentation built on April 21, 2022, 1:40 a.m.