This example demonstrates how to use cmest
for a case control study. For this purpose, we simulate some data containing a continuous baseline confounder $C_1$, a binary baseline confounder $C_2$, a binary exposure $A$, a binary mediator $M$ and a binary outcome $Y$. We sample 2000 cases out of all cases and sample 2000 controls out of all controls. The true regression models for $A$, $M$ and $Y$ are:
$$logit(E(A|C_1,C_2))=0.2+0.5C_1+0.1C_2$$
$$logit(E(M|A,C_1,C_2))=1+2A+1.5C_1+0.8C_2$$
$$logit(E(Y|A,M,C_1,C_2)))=-5+0.8A-1.8M+0.5AM+0.3C_1-0.6C_2$$
set.seed(1) # data simulation expit <- function(x) exp(x)/(1+exp(x)) n <- 1000000 C1 <- rnorm(n, mean = 1, sd = 0.1) C2 <- rbinom(n, 1, 0.6) A <- rbinom(n, 1, expit(0.2 + 0.5*C1 + 0.1*C2)) M <- rbinom(n, 1, expit(1 + 2*A + 1.5*C1 + 0.8*C2)) Y <- rbinom(n, 1, expit(-5 + 0.8*A - 1.8*M + 0.5*A*M + 0.3*C1 - 0.6*C2)) yprevalence <- sum(Y)/n data <- data.frame(A, M, Y, C1, C2) case_indice <- sample(which(data$Y == 1), 2000, replace = FALSE) control_indice <- sample(which(data$Y == 0), 2000, replace = FALSE) data <- data[c(case_indice, control_indice), ]
The DAG for this scientific setting is:
library(CMAverse) cmdag(outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), postc = NULL, node = TRUE, text_col = "white")
For a case control study, we set the casecontrol
argument to be TRUE
. It requires that either the prevalence of the case be known or the case be rare. We use the regression-based approach for illustration.
If the prevalence of the case is known, we specify it by the yprevalence
argument. The results are:
res_yprevelence <- cmest(data = data, model = "rb", casecontrol = TRUE, yprevalence = yprevalence, outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic"), yreg = "logistic", astar = 0, a = 1, mval = list(1), estimation = "paramfunc", inference = "delta")
summary(res_yprevelence)
If the prevalence of the case is unknown but we know the case is rare, we set the yrare
argument to be TRUE
. The results are:
res_yrare <- cmest(data = data, model = "rb", casecontrol = TRUE, yrare = TRUE, outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic"), yreg = "logistic", astar = 0, a = 1, mval = list(1), estimation = "paramfunc", inference = "delta")
summary(res_yrare)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.