.footer { color: black; background: #E8E8E8; position: fixed; top: 90%; text-align:center; width:100%; } .small-code pre code { font-size: 1em; } .midcenter { position: fixed; top: 50%; left: 50%; }
author: Min Jin Ha, PhD date: autosize: true
class: small-code
Install mediateR
from github using R devtools
install.packages("devtools")
devtools::install_github("longjp/mediateR")
Alternatively, install it from github in a terminal
$ git clone https://github.com/longjp/mediateR.git
$ R CMD INSTALL mediateR
Load the R package mediateR
.
library(mediateR)
set.seed(2022)
class: small-code
The mediateR
contains functions to simulate data for mediation analysis. The following code simulates 2 observations with 3 exposures and 2 mediators for linear outcome model.
params <- mediateR:::QuickSim(n=2,nxx=3,nmm=2,family="gaussian")
dat <- SimulateData(params)
One can manually create a list with these elements or rely on the simulation function SimulateData
to create the structure. The argument to SimulateData
has parameters controlling the sample size, coefficient estimates, number of mediators, etc.
class: small-code
dat$y
: response
dat$y
[1] 7.336213 1.446297
dat$mm
: mediators
dat$mm
m1 m2
[1,] -0.2584175 2.0061857
[2,] 0.3148540 -0.4818267
dat$xx
: exposures
dat$xx
x1 x2 x3
[1,] 0 1 1
[2,] 1 0 1
dat$family
: data type of response
dat$family
[1] "gaussian"
The goal is to assess the causal impact of exposure (dat$xx
) on response (dat$y
) and quantify how much of this effect passes through mediators (dat$mm
), adjusting for covariates (dat$co
).
Counterfactual random variables are used to formally define causal interventions and the notions of direct and indirect effects.
Let $Y^{X=x'}$ be the value of $Y$ obtained by setting $X=x'$ , possibly counter to fact. For notational simplicity, we will write $Y^{x'}$ when it is clear that $X$ is being set to $x'$ .
Counterfactual notation can also express interventions on multiple variables. For example $Y^{x'',M^{x'}}$, the value $Y$ would have obtained had $X$ been set to $x''$ and had $M$ been set to the value it would have obtained had $x$ been set to $x'$.
The direct effect $DE_{X_i}(x',x'')$ and indirect effect $IE_{X_i}(x',x'')$ when changing $X_i$ from $x'$ to $x''$ with respect to mediators $M$ are given by:
Mediator model: $$\boldsymbol{m} = \boldsymbol{\beta}^{(X)}\boldsymbol{x} + \boldsymbol{\beta}^{(C)}\boldsymbol{c}+ \boldsymbol{\beta}^{(0)} + \boldsymbol{\epsilon}$$
Outcome model:
Linear: $$y = \boldsymbol{x}^T \boldsymbol{\alpha}^{(X)} + \boldsymbol{m}^T \boldsymbol{\alpha}^{(M)} + \boldsymbol{c}^T \boldsymbol{\alpha}^{(C)} + \alpha^{(0)} + \delta$$
Logistic: $$Y \sim Bernoulli(p), logit(p) = \boldsymbol{x}^T \boldsymbol{\alpha}^{(X)} + \boldsymbol{m}^T \boldsymbol{\alpha}^{(M)} + \boldsymbol{c}^T \boldsymbol{\alpha}^{(C)} + \alpha^{(0)}, $$
Cox proportional hazards: $$h(y|\boldsymbol{x},\boldsymbol{m},\boldsymbol{c}) = h_0(y)e^{\boldsymbol{x}^T \boldsymbol{\alpha}^{(X)} + \boldsymbol{m}^T \boldsymbol{\alpha}^{(m)} + \boldsymbol{c}^T \boldsymbol{\alpha}^{(C)}}$$
Parameter estimation: - Generalized linear model or Cox proportional hazard model - Ridge penalty is incorporated for high-dimensional mediators
Monte Carlo sampling to approximate the direct/indirect effects: - Relax the “rare outcome” assumption
We propose computing confidence intervals and hypothesis tests using bootstrap quantiles.
class: small-code
The following code simulates 500 observations with 3 exposures and 3 mediators for linear outcome model.
sim_params <- mediateR:::QuickSim(500,3,3,"gaussian")
dat <- SimulateData(sim_params)
ComputePath
is used to fit the mediation analysis framework for various response, where dat
are the list with the format mentioned above, mmn=F
assuming independent mediators and reg=T
is used for high-dimensional mediators (default is false).
fit <- ComputePath(dat, mmn = T)
class: small-code
ComputeEffectsLinear
provides estimation of direct, indirect and total effect for linear model. Plugging in the true parameters sim_params
gives the true effects.
eff_est <- ComputeEffectsLinear(fit)
eff <- ComputeEffectsLinear(sim_params)
library(kableExtra)
eff_comb <- matrix(0,nrow(eff),ncol=2*ncol(eff))
for(ii in 1:ncol(eff)){
eff_comb[,2*ii-1] <- eff_est[,ii]; eff_comb[,2*ii] <- eff[,ii]
}
for(ii in (ncol(dat$xx)+1):(ncol(dat$xx)+ncol(dat$mm))){
eff_comb[ii,3:ncol(eff_comb)] <- NA
}
colnames(eff_comb) <- rep(c("Estimated","True"),ncol(eff)); rownames(eff_comb) <- rownames(eff)
class: small-code
Table below summarizes the Indirect, Direct, and Total effects.
kable(eff_comb[1:ncol(dat$xx),],digits=2, format = "html", excape="T") %>%
kable_styling(bootstrap_options = "striped", full_width = T) %>%
add_header_above(c(" ", "Direct" = 2, "Indirect" = 2, "Total" = 2))
class: small-code
We propose computing confidence intervals and hypothesis tests using bootstrap sampling quantiles.
The following code compute the confidence interval of total effect of x1 on the response.
B <- 1000
eff_est_boot <- vector("list",length=B)
for(ii in 1:B){
ix <- sample(1:sim_params$n,replace=TRUE)
dat_sub <- mediateR:::SubsetDat(dat,ix)
fit <- ComputePath(dat_sub, mmn = T)
eff_est_boot[[ii]] <- ComputeEffectsLinear(fit)
}
total_xx1 <- vapply(eff_est_boot,function(x){x[1,3]},c(0)) #x1 Total
#indirect_xx1 <- vapply(eff_est_boot,function(x){x[1,2]},c(0)) #x1 Indirect
CI95 = round(quantile(total_xx1, c(0.025, 1 - 0.025)),2)
class: small-code
Figure below shows the histogram of total effects of Bootstrap samples for $x_1$ The 95% CI of the total effect for $x_1$ is (1.8, 2.89)
hist(total_xx1,main="Bootstrap Samples x1 Total Effect",xlab="Estimate")
abline(v=eff[1,3],lwd=3,col='red')
abline(v=eff_est[1,3],lwd=3,col='blue')
legend("topright",c("Truth","Point Estimate"),col=c("red","blue"),lwd=2, bty="n", cex=0.8)
class: small-code
Same as previous simulation but with exponentially distributed survival times. The rate is $e^{\beta^Tx}$ where $x$ are the covariates. The following code simulates 500 observations with 3 exposures and 3 mediators for survival outcome model.
sim_params$family <- "cox"
dat <- SimulateData(sim_params)
rmean <- max(as.matrix(dat$y)[,1])
This model can be fit with Cox Proportional Hazards. We compute the effects on mean survival restricted to 284 days.
fit <- ComputePath(dat, mmn = T)
class: small-code
Given the simulation setting, we approximate true effects by approximating integral with large sample size use the maximum observed time (284 days) as the restricted mean time.
sim_params2 <- sim_params
sim_params2$n <- 1e4 # use many observations to approximate true effects
dat2 <- SimulateData(sim_params2)
fit2 <- ComputePath(dat2, mmn = T)
direct_t <- ComputeEffectxx(dat2,fit2,"direct",rmean=rmean)
indirect_t <- ComputeEffectxx(dat2,fit2,"indirect",rmean=rmean)
total_t <- ComputeEffectxx(dat2,fit2,"total",rmean=rmean)
Similarly, ComputeEffectxx
provides estimation of direct, indirect and total effect.
direct <- ComputeEffectxx(dat,fit,"direct",rmean=rmean)
indirect <- ComputeEffectxx(dat,fit,"indirect",rmean=rmean)
total <- ComputeEffectxx(dat,fit,"total",rmean=rmean)
eff_comb <- cbind(direct,direct_t,indirect,indirect_t,total,total_t)
rownames(eff_comb) <- names(fit$xx_direct)
colnames(eff_comb) <- rep(c("Estimated","True"),3)
xx1total_sample <- eff_comb[1,5]
xx1total_true <- eff_comb[1,6]
class: small-code
Table below summarizes the Indirect, Direct, and Total effects.
kable(eff_comb,digits=2) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(" ", "direct" = 2, "indirect" = 2, "total" = 2))
class: small-code
We propose computing confidence intervals and hypothesis tests using bootstrap sampling quantiles.
The following code computes the confidence interval of total effect of x1 on the response.
B <- 1000
total_xx1 <- rep(0,length=B)
for(ii in 1:B){
ix <- sample(1:sim_params$n,replace=TRUE)
dat_boot <- mediateR:::SubsetDat(dat,ix)
fit_boot <- ComputePath(dat_boot, mmn = T)
total_xx1[ii] <- ComputeEffectxx(dat_boot,fit_boot,"total",rmean=rmean)[1] #total
}
CI95 = round(quantile(total_xx1, c(0.025, 1 - 0.025)),2)
class: small-code
Figure below shows the histogram of total effects of Bootstrap samples for $x_1$. The 95% CI of the total effect for $x_1$ is (-74.8, -35.52).
hist(total_xx1,main="Bootstrap Samples x1 Total Effect",xlab="Estimate")
abline(v=xx1total_sample,lwd=3,col='blue')
abline(v=xx1total_true,lwd=3,col='red')
legend("topright",c("Truth","Point Estimate"),col=c("red","blue"),lwd=2, bty="n", cex=0.8)
class: small-code
fit$cov_mm
gives partial correlation between mediators adjusting for exposures and covariates. round(fit$cov,2)
[,1] [,2] [,3]
[1,] 1.02 0.05 -0.06
[2,] 0.05 1.07 -0.01
[3,] -0.06 -0.01 1.08
fit$ivcor_mm
gives partial correlation between mediators adjusting for exposures, covariates and other mediators. round(fit$ivcor_mm,2)
[,1] [,2] [,3]
[1,] 1.00 -0.04 0.05
[2,] -0.04 1.00 0.00
[3,] 0.05 0.00 1.00
class: small-code
Here we show how to compute confidence intervals using bootstrap sampling quantiles.
corrmm <- array(0,dim = c(B, ncol(dat$mm),ncol(dat$mm)))
for(ii in 1:B){
ix <- sample(1:sim_params$n,replace=TRUE)
dat_boot <- mediateR:::SubsetDat(dat,ix)
fit_boot <- ComputePath(dat_boot, mmn = T)
corrmm[ii,,] <- fit_boot$ivcor_mm
}
CI95 = round(quantile(corrmm[,1,2], c(0.025, 1 - 0.025)),2)
class: small-code
Figure below shows the histogram of partial correlation between m1 and m2 adjusting for other mediators, exposures and covariates. Its 95% CI is (-0.13, 0.05).
hist(corrmm[,1,2],main="Bootstrap samples of partial \n correlation between m1 and m2",xlab="Estimate")
abline(v=fit$ivcor_mm[1,2],lwd=3,col='blue')
abline(v=0,lwd=3,col='red')
legend("topright",c("Truth","Point Estimate"),col=c("red","blue"),lwd=2, bty="n", cex=0.8)
======================================================== class: small-code
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.