vignettes/MediationAnalysis.md

.footer { color: black; background: #E8E8E8; position: fixed; top: 90%; text-align:center; width:100%; } .small-code pre code { font-size: 1em; } .midcenter { position: fixed; top: 50%; left: 50%; }

Mediation Analysis with mediateR

author: Min Jin Ha, PhD date: autosize: true

Outline

Installation

class: small-code

Install mediateR from github using R devtools

install.packages("devtools")
devtools::install_github("longjp/mediateR")

Alternatively, install it from github in a terminal

$ git clone https://github.com/longjp/mediateR.git
$ R CMD INSTALL mediateR

Load the R package mediateR.

library(mediateR)
set.seed(2022)

Data Structure

class: small-code

The mediateR contains functions to simulate data for mediation analysis. The following code simulates 2 observations with 3 exposures and 2 mediators for linear outcome model.

params <- mediateR:::QuickSim(n=2,nxx=3,nmm=2,family="gaussian")
dat <- SimulateData(params)

One can manually create a list with these elements or rely on the simulation function SimulateData to create the structure. The argument to SimulateData has parameters controlling the sample size, coefficient estimates, number of mediators, etc.

Data Structure: Example

class: small-code

dat$y: response

dat$y
[1] 7.336213 1.446297

dat$mm: mediators

dat$mm
             m1         m2
[1,] -0.2584175  2.0061857
[2,]  0.3148540 -0.4818267

dat$xx: exposures

dat$xx
     x1 x2 x3
[1,]  0  1  1
[2,]  1  0  1

dat$family: data type of response

dat$family
[1] "gaussian"

Mediation Analysis

The goal is to assess the causal impact of exposure (dat$xx) on response (dat$y) and quantify how much of this effect passes through mediators (dat$mm), adjusting for covariates (dat$co).

Counterfactual Notations

Counterfactual random variables are used to formally define causal interventions and the notions of direct and indirect effects.

Let $Y^{X=x'}$ be the value of $Y$ obtained by setting $X=x'$ , possibly counter to fact. For notational simplicity, we will write $Y^{x'}$ when it is clear that $X$ is being set to $x'$ .

Counterfactual notation can also express interventions on multiple variables. For example $Y^{x'',M^{x'}}$, the value $Y$ would have obtained had $X$ been set to $x''$ and had $M$ been set to the value it would have obtained had $x$ been set to $x'$.

Direct and Indirect Effect

The direct effect $DE_{X_i}(x',x'')$ and indirect effect $IE_{X_i}(x',x'')$ when changing $X_i$ from $x'$ to $x''$ with respect to mediators $M$ are given by:

Probability Model

Mediator model: $$\boldsymbol{m} = \boldsymbol{\beta}^{(X)}\boldsymbol{x} + \boldsymbol{\beta}^{(C)}\boldsymbol{c}+ \boldsymbol{\beta}^{(0)} + \boldsymbol{\epsilon}$$

Outcome model:

Computation and Hypothesis Testing

Parameter estimation: - Generalized linear model or Cox proportional hazard model - Ridge penalty is incorporated for high-dimensional mediators

Monte Carlo sampling to approximate the direct/indirect effects: - Relax the “rare outcome” assumption

We propose computing confidence intervals and hypothesis tests using bootstrap quantiles.

Example: Linear Model

class: small-code

The following code simulates 500 observations with 3 exposures and 3 mediators for linear outcome model.

sim_params <- mediateR:::QuickSim(500,3,3,"gaussian")
dat <- SimulateData(sim_params)

ComputePath is used to fit the mediation analysis framework for various response, where dat are the list with the format mentioned above, mmn=F assuming independent mediators and reg=T is used for high-dimensional mediators (default is false).

fit <- ComputePath(dat, mmn = T)

Linear - Model fitting

class: small-code

ComputeEffectsLinear provides estimation of direct, indirect and total effect for linear model. Plugging in the true parameters sim_params gives the true effects.

eff_est <- ComputeEffectsLinear(fit)
eff <- ComputeEffectsLinear(sim_params)

library(kableExtra)
eff_comb <- matrix(0,nrow(eff),ncol=2*ncol(eff))
for(ii in 1:ncol(eff)){
  eff_comb[,2*ii-1] <- eff_est[,ii]; eff_comb[,2*ii] <- eff[,ii]
}
for(ii in (ncol(dat$xx)+1):(ncol(dat$xx)+ncol(dat$mm))){
  eff_comb[ii,3:ncol(eff_comb)] <- NA
}
colnames(eff_comb) <- rep(c("Estimated","True"),ncol(eff)); rownames(eff_comb) <- rownames(eff)

Linear - Direct and Indirect Effect

class: small-code

Table below summarizes the Indirect, Direct, and Total effects.

kable(eff_comb[1:ncol(dat$xx),],digits=2, format = "html", excape="T") %>%
  kable_styling(bootstrap_options = "striped", full_width = T) %>%
  add_header_above(c(" ", "Direct" = 2, "Indirect" = 2, "Total" = 2))
Direct
Indirect
Total
Estimated True Estimated True Estimated True x1 1.05 1 1.29 1.5 2.34 2.5 x2 1.07 1 0.32 0.0 1.39 1.0 x3 0.85 1 1.41 1.5 2.26 2.5

Linear - Bootstrap

class: small-code

We propose computing confidence intervals and hypothesis tests using bootstrap sampling quantiles.

The following code compute the confidence interval of total effect of x1 on the response.

B <- 1000
eff_est_boot <- vector("list",length=B)
for(ii in 1:B){
  ix <- sample(1:sim_params$n,replace=TRUE)
  dat_sub <- mediateR:::SubsetDat(dat,ix)
  fit <- ComputePath(dat_sub, mmn = T)
  eff_est_boot[[ii]] <- ComputeEffectsLinear(fit)
}
total_xx1 <- vapply(eff_est_boot,function(x){x[1,3]},c(0)) #x1 Total
#indirect_xx1 <- vapply(eff_est_boot,function(x){x[1,2]},c(0)) #x1 Indirect
CI95 = round(quantile(total_xx1, c(0.025, 1 - 0.025)),2)

Linear - Bootstrap

class: small-code

Figure below shows the histogram of total effects of Bootstrap samples for $x_1$ The 95% CI of the total effect for $x_1$ is (1.8, 2.89)

hist(total_xx1,main="Bootstrap Samples x1 Total Effect",xlab="Estimate")
abline(v=eff[1,3],lwd=3,col='red')
abline(v=eff_est[1,3],lwd=3,col='blue')
legend("topright",c("Truth","Point Estimate"),col=c("red","blue"),lwd=2, bty="n", cex=0.8)

plot of chunk bootstrap_plot

Example: Survival

class: small-code

Same as previous simulation but with exponentially distributed survival times. The rate is $e^{\beta^Tx}$ where $x$ are the covariates. The following code simulates 500 observations with 3 exposures and 3 mediators for survival outcome model.

sim_params$family <- "cox"
dat <- SimulateData(sim_params)
rmean <- max(as.matrix(dat$y)[,1])

This model can be fit with Cox Proportional Hazards. We compute the effects on mean survival restricted to 284 days.

fit <- ComputePath(dat, mmn = T)

Survival - Direct and Indirect Effect

class: small-code

Given the simulation setting, we approximate true effects by approximating integral with large sample size use the maximum observed time (284 days) as the restricted mean time.

sim_params2 <- sim_params
sim_params2$n <- 1e4 # use many observations to approximate true effects
dat2 <- SimulateData(sim_params2)
fit2 <- ComputePath(dat2, mmn = T) 
direct_t <- ComputeEffectxx(dat2,fit2,"direct",rmean=rmean)
indirect_t <- ComputeEffectxx(dat2,fit2,"indirect",rmean=rmean)
total_t <- ComputeEffectxx(dat2,fit2,"total",rmean=rmean)

Similarly, ComputeEffectxx provides estimation of direct, indirect and total effect.

direct <- ComputeEffectxx(dat,fit,"direct",rmean=rmean)
indirect <- ComputeEffectxx(dat,fit,"indirect",rmean=rmean)
total <- ComputeEffectxx(dat,fit,"total",rmean=rmean)
eff_comb <- cbind(direct,direct_t,indirect,indirect_t,total,total_t)
rownames(eff_comb) <- names(fit$xx_direct)
colnames(eff_comb) <- rep(c("Estimated","True"),3)
xx1total_sample <- eff_comb[1,5]
xx1total_true <- eff_comb[1,6]

Survival - Direct and Indirect Effect

class: small-code

Table below summarizes the Indirect, Direct, and Total effects.

kable(eff_comb,digits=2) %>%
  kable_styling(bootstrap_options = "striped", full_width = F) %>%
  add_header_above(c(" ", "direct" = 2, "indirect" = 2, "total" = 2))
direct
indirect
total
Estimated True Estimated True Estimated True x1 -25.39 -22.46 -37.03 -33.35 -60.35 -56.51 x2 -3.82 -21.89 1.40 -1.65 -21.12 -26.71 x3 -12.58 -23.45 -17.78 -40.67 -50.67 -60.18

Survival - Bootstrap

class: small-code

We propose computing confidence intervals and hypothesis tests using bootstrap sampling quantiles.

The following code computes the confidence interval of total effect of x1 on the response.

B <- 1000
total_xx1 <- rep(0,length=B)
for(ii in 1:B){
  ix <- sample(1:sim_params$n,replace=TRUE)
  dat_boot <- mediateR:::SubsetDat(dat,ix)
  fit_boot <- ComputePath(dat_boot, mmn = T)
  total_xx1[ii] <- ComputeEffectxx(dat_boot,fit_boot,"total",rmean=rmean)[1] #total
}
CI95 = round(quantile(total_xx1, c(0.025, 1 - 0.025)),2)

Survival - Bootstrap

class: small-code

Figure below shows the histogram of total effects of Bootstrap samples for $x_1$. The 95% CI of the total effect for $x_1$ is (-74.8, -35.52).

hist(total_xx1,main="Bootstrap Samples x1 Total Effect",xlab="Estimate")
abline(v=xx1total_sample,lwd=3,col='blue')
abline(v=xx1total_true,lwd=3,col='red')
legend("topright",c("Truth","Point Estimate"),col=c("red","blue"),lwd=2, bty="n", cex=0.8)

plot of chunk bootstrap_plot3

Example: Mediator-mediator relations

class: small-code

round(fit$cov,2)
      [,1]  [,2]  [,3]
[1,]  1.02  0.05 -0.06
[2,]  0.05  1.07 -0.01
[3,] -0.06 -0.01  1.08
round(fit$ivcor_mm,2)
      [,1]  [,2] [,3]
[1,]  1.00 -0.04 0.05
[2,] -0.04  1.00 0.00
[3,]  0.05  0.00 1.00

Mediator-mediator relations: Bootstrap

class: small-code

Here we show how to compute confidence intervals using bootstrap sampling quantiles.

corrmm <- array(0,dim = c(B, ncol(dat$mm),ncol(dat$mm)))
for(ii in 1:B){
  ix <- sample(1:sim_params$n,replace=TRUE)
  dat_boot <- mediateR:::SubsetDat(dat,ix)
  fit_boot <- ComputePath(dat_boot, mmn = T)
  corrmm[ii,,] <- fit_boot$ivcor_mm
}
CI95 = round(quantile(corrmm[,1,2], c(0.025, 1 - 0.025)),2)

Mediator-mediator relations: Bootstrap

class: small-code

Figure below shows the histogram of partial correlation between m1 and m2 adjusting for other mediators, exposures and covariates. Its 95% CI is (-0.13, 0.05).

hist(corrmm[,1,2],main="Bootstrap samples of partial \n correlation between m1 and m2",xlab="Estimate")
abline(v=fit$ivcor_mm[1,2],lwd=3,col='blue')
abline(v=0,lwd=3,col='red')
legend("topright",c("Truth","Point Estimate"),col=c("red","blue"),lwd=2,  bty="n", cex=0.8)

plot of chunk unnamed-chunk-10

======================================================== class: small-code

Thank you!


longjp/mediateR documentation built on May 24, 2023, 12:30 p.m.