knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
set.seed(1234) library(mediateR) library(ggplot2) library(ggfortify)
The code contains functions to simulate data for mediation analysis.
params <- SimpleSim(n=10,family="gaussian") dat <- SimulateData(params) dat
The dat
list with named elements (r names(dat)
) is used as an argument for functions which estimate the direct and indirect effects. One can manually create a list with these elements (such as when fitting models to new data) or rely on the simulation function SimulateData
to create the structure. The argument to SimulateData
has parameters controlling the sample size, coefficient estimates, number of mediators, etc.
We simulate a decent sized data set for mediation analysis:
params <- SimpleSim(n=1000,family="gaussian") dat <- SimulateData(params) dim(dat$xx) dim(dat$mm)
The goal is to assess the causal impact of dat$xx
on dat$y
and quantify how much of this effect passes through dat$mm
. For some initial evidence, see that $x$ is correlated with both $y$ and $m$ and $m$ and $y$ are correlated:
par(mfcol=c(1,3),mar=c(5,4,1,1)) boxplot(dat$y~dat$xx[,1],xlab="x",ylab="y") boxplot(dat$mm[,1]~dat$xx[,1],xlab="x",ylab="m") plot(dat$mm,dat$y,xlab="m",ylab="y")
In linear models, the direct effect is the path coefficient $x \rightarrow m$ and indirect effect is the product of path coefficients $x\rightarrow m$ and $m \rightarrow y$. We compute the effects 3 ways.
Method 3 is not necessary in linear models (method 2 is easier), but it generalizes to non--linear models where method 2 is no longer applicable. Method 1 is only possible because this is a simulation where we know ground truth.
ComputeEffectsLinear(params)
fit <- ComputePath(dat) ComputeEffectsLinear(fit)
effect <- c(ComputeEffectxx(dat,fit,"direct"), ComputeEffectxx(dat,fit,"indirect"), ComputeEffectxx(dat,fit,"total")) effect
We can also compute effects for survival models.
params <- SimpleSim(n=1000,family="cox") dat <- SimulateData(params) class(dat$y) ## response is Surv object
Survival curves suggest very strong effect of dat$xx
on dat$y
:
fit <- survival::survfit(dat$y~as.factor(dat$xx[,1])) autoplot(fit,xlab="Survival Time (months)",ylab="Survival")
Compute mediation effect point estimates:
rmean <- 10 fit <- ComputePath(dat) effect <- c("direct"=ComputeEffectxx(dat,fit,"direct",rmean=rmean), "indirect"=ComputeEffectxx(dat,fit,"indirect",rmean=rmean), "total"=ComputeEffectxx(dat,fit,"total",rmean=rmean)) effect
Appears to be strong negative causal effect with most of this effect mediated by dat$mm
.
B <- 500 boot_effect <- matrix(NA_real_,nrow=B,ncol=2) for(ii in 1:B){ ix <- sample(1:params$n,replace=TRUE) dat_boot <- mediateR:::SubsetDat(dat,ix) fit_boot <- ComputePath(dat_boot) boot_effect[ii,1] <- ComputeEffectxx(dat_boot,fit_boot,"direct",rmean=rmean) boot_effect[ii,2] <- ComputeEffectxx(dat_boot,fit_boot,"indirect",rmean=rmean) }
Plot bootstrap samples:
par(mar=c(5,5,1,1)) plot(boot_effect[,1],boot_effect[,2], xlim=range(c(0,boot_effect[,1])),ylim=range(c(0,boot_effect[,2])), xlab="Direct Effect (months)",ylab="Indirect Effect (months)") points(effect[1],effect[2],col='red',pch=19) points(0,0,col='blue',pch=19) abline(h=0) abline(v=0)
Conclusion: dat$xx
has a strong negative causal impact on survival with most of this effect passing through the mediator dat$mm
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.