This example demonstrates how to estimate causal effects with multiple imputations by using cmest
when the dataset contains missing values. For this purpose, we simulate some data containing a continuous baseline confounder $C_1$, a binary baseline confounder $C_2$, a binary exposure $A$, a binary mediator $M$ and a binary outcome $Y$. The true regression models for $A$, $M$ and $Y$ are:
$$logit(E(A|C_1,C_2))=0.2+0.5C_1+0.1C_2$$
$$logit(E(M|A,C_1,C_2))=1+2A+1.5C_1+0.8C_2$$
$$logit(E(Y|A,M,C_1,C_2)))=-3-0.4A-1.2M+0.5AM+0.3C_1-0.6C_2$$
To create the dataset with missing values, we randomly delete 10% of values in the dataset.
set.seed(1) expit <- function(x) exp(x)/(1+exp(x)) n <- 10000 C1 <- rnorm(n, mean = 1, sd = 0.1) C2 <- rbinom(n, 1, 0.6) A <- rbinom(n, 1, expit(0.2 + 0.5*C1 + 0.1*C2)) M <- rbinom(n, 1, expit(1 + 2*A + 1.5*C1 + 0.8*C2)) Y <- rbinom(n, 1, expit(-3 - 0.4*A - 1.2*M + 0.5*A*M + 0.3*C1 - 0.6*C2)) data_noNA <- data.frame(A, M, Y, C1, C2) missing <- sample(1:(5*n), n*0.1, replace = FALSE) C1[missing[which(missing <= n)]] <- NA C2[missing[which((missing > n)*(missing <= 2*n) == 1)] - n] <- NA A[missing[which((missing > 2*n)*(missing <= 3*n) == 1)] - 2*n] <- NA M[missing[which((missing > 3*n)*(missing <= 4*n) == 1)] - 3*n] <- NA Y[missing[which((missing > 4*n)*(missing <= 5*n) == 1)] - 4*n] <- NA data <- data.frame(A, M, Y, C1, C2)
The DAG for this scientific setting is:
library(CMAverse) cmdag(outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), postc = NULL, node = TRUE, text_col = "white")
To conduct multiple imputations, we set the multimp
argument to be TRUE
. The regression-based approach is used for illustration. The multiple imputation results:
res_multimp <- cmest(data = data, model = "rb", outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic"), yreg = "logistic", astar = 0, a = 1, mval = list(1), estimation = "paramfunc", inference = "delta", multimp = TRUE, args_mice = list(m = 10))
summary(res_multimp)
Compare the multiple imputation results with true results:
res_noNA <- cmest(data = data_noNA, model = "rb", outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic"), yreg = "logistic", astar = 0, a = 1, mval = list(1), estimation = "paramfunc", inference = "delta")
summary(res_noNA)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.