This example demonstrates how to use cmest for a case control study. For this purpose, we simulate some data containing a continuous baseline confounder $C_1$, a binary baseline confounder $C_2$, a binary exposure $A$, a binary mediator $M$ and a binary outcome $Y$. We sample 2000 cases out of all cases and sample 2000 controls out of all controls. The true regression models for $A$, $M$ and $Y$ are:
$$logit(E(A|C_1,C_2))=0.2+0.5C_1+0.1C_2$$
$$logit(E(M|A,C_1,C_2))=1+2A+1.5C_1+0.8C_2$$
$$logit(E(Y|A,M,C_1,C_2)))=-5+0.8A-1.8M+0.5AM+0.3C_1-0.6C_2$$
set.seed(1) # data simulation expit <- function(x) exp(x)/(1+exp(x)) n <- 1000000 C1 <- rnorm(n, mean = 1, sd = 0.1) C2 <- rbinom(n, 1, 0.6) A <- rbinom(n, 1, expit(0.2 + 0.5*C1 + 0.1*C2)) M <- rbinom(n, 1, expit(1 + 2*A + 1.5*C1 + 0.8*C2)) Y <- rbinom(n, 1, expit(-5 + 0.8*A - 1.8*M + 0.5*A*M + 0.3*C1 - 0.6*C2)) yprevalence <- sum(Y)/n data <- data.frame(A, M, Y, C1, C2) case_indice <- sample(which(data$Y == 1), 2000, replace = FALSE) control_indice <- sample(which(data$Y == 0), 2000, replace = FALSE) data <- data[c(case_indice, control_indice), ]
The DAG for this scientific setting is:
library(CMAverse) cmdag(outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), postc = NULL, node = TRUE, text_col = "white")
For a case control study, we set the casecontrol argument to be TRUE. It requires that either the prevalence of the case be known or the case be rare. We use the regression-based approach for illustration.
If the prevalence of the case is known, we specify it by the yprevalence argument. The results are:
res_yprevelence <- cmest(data = data, model = "rb", casecontrol = TRUE, yprevalence = yprevalence, outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic"), yreg = "logistic", astar = 0, a = 1, mval = list(1), estimation = "paramfunc", inference = "delta")
summary(res_yprevelence)
If the prevalence of the case is unknown but we know the case is rare, we set the yrare argument to be TRUE. The results are:
res_yrare <- cmest(data = data, model = "rb", casecontrol = TRUE, yrare = TRUE, outcome = "Y", exposure = "A", mediator = "M", basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic"), yreg = "logistic", astar = 0, a = 1, mval = list(1), estimation = "paramfunc", inference = "delta")
summary(res_yrare)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.