A treatment often affects an outcome indirectly, through a particular pathway, by its effect on intermediate variables (mediators). Causal mediation analysis concerns the construction and evaluation of these indirect effects and the direct effects that are complementary to them. Generally, the indirect effect (IE) of a treatment on an outcome is the portion of the total effect that is found to work through mediators, while the direct effect includes all other components of the total effect (including both the effect of the treatment on the outcome and the effect through all paths not explicitly involving the mediators). Identifying and quantifying the mechanisms underlying causal effects is an increasingly popular endeavor in public health, medicine, and the social sciences, as such mechanistic knowledge improves understanding of both why and how treatments may be effective.
While the study of mediation analysis may be traced back quite far, the field
only came into its modern form with the identification and careful study of the
natural direct and indirect effects [@robins1992identifiability;
pearl2001direct]. The natural direct effect (NDE) and the natural indirect
effect (NIE) are based on a decomposition of the average treatment effect (ATE)
in the presence of mediators [@vanderweele2015explanation], with requisite
theory for the construction of efficient estimators of these quantities only
receiving attention recently [@tchetgen2012semiparametric]. Here, we examine
the use of the tmle3mediate
package for constructing targeted maximum
likelihood (TML) estimators of the NDE and NIE [@zheng2012targeted].
To make our methodology and results comparable to that exposed in existing
tools, we'll take as a running example a simple dataset from an observational
study of the relationship between BMI and kids' behavior, distributed as part of
the mma
R package on CRAN. First,
let's load the packages we'll be using and set a seed; then, load this dataset
and take a quick look at it
# preliminaries library(dplyr) library(tidyr) library(sl3) library(tmle3) library(tmle3mediate) # load and examine data library(mma) data(weight_behavior) # set a seed set.seed(429153)
The documentation for the dataset describes it as a "database obtained from the Louisiana State University Health Sciences Center, New Orleans, by Dr. Richard Scribner. He explored the relationship between BMI and kids' behavior through a survey at children, teachers and parents in Grenada in 2014. This data set includes 691 observations and 15 variables."
Unfortunately, the dataset contains a few observations with missing values. As these are unrelated to the demonstration of our analytic methodology, we'll simply remove these for the time being. Note that in a real-world data analysis, we would instead consider strategies for working with the observed data and missing observations, including imputation and inverse probability of censoring weighting. For now, we simply remove the incomplete observations, resulting in a dataset with fewer observations but much the same structure as the original:
# remove missing values weight_behavior_complete <- weight_behavior %>% drop_na() %>% mutate( sports = as.numeric(sports) - 1 ) %>% as_tibble() weight_behavior_complete
For the analysis of this observational dataset, we focus on the effect of
participating in a sports team (sports
) on the BMI of children (bmi
), taking
several related covariates as mediators (snack
, exercises
, overweigh
) and
all other collected covariates as potential confounders. Considering an NPSEM,
we separate the observed variables from the dataset into their corresponding
nodes as follows:
exposure <- "sports" outcome <- "bmi" mediators <- c("snack", "exercises", "overweigh") covars <- setdiff(colnames(weight_behavior_complete), c(exposure, outcome, mediators)) node_list <- list( W = covars, A = exposure, Z = mediators, Y = outcome )
Here the node_list
encodes the parents of each node -- for example, $Z$ (the
mediators) have parents $A$ (the treatment) and $W$ (the baseline confounders),
and $Y$ (the outcome) has parents $Z$, $A$, and $W$.
To use the natural (in)direct effects for mediation analysis, we decompose the
ATE into its corresponding direct and indirect effect components. Throughout,
we'll make use of the (semiparametric efficient) TML estimators of
@zheng2012targeted, which allow for flexible ensemble machine learning to be
incorporated into the estimation of nuisance parameters. For this, we rely on
the sl3
R package [@coyle2020sl3], which provides
an implementation of machine learning pipelines and the Super Learner algorithm;
for more on the sl3
R package, consider consulting this chapter of the
tlverse
handbook. Below, we
construct an ensemble learner using a handful of popular machine learning
algorithms:
# SL learners used for continuous data (the nuisance parameter M) enet_contin_learner <- Lrnr_glmnet$new( alpha = 0.5, family = "gaussian", nfolds = 3 ) lasso_contin_learner <- Lrnr_glmnet$new( alpha = 1, family = "gaussian", nfolds = 3 ) fglm_contin_learner <- Lrnr_glm_fast$new(family = gaussian()) mean_learner <- Lrnr_mean$new() contin_learner_lib <- Stack$new( enet_contin_learner, lasso_contin_learner, fglm_contin_learner, mean_learner ) sl_contin_learner <- Lrnr_sl$new(learners = contin_learner_lib, metalearner = Lrnr_nnls$new()) # SL learners used for binary data (nuisance parameters G and E in this case) enet_binary_learner <- Lrnr_glmnet$new( alpha = 0.5, family = "binomial", nfolds = 3 ) lasso_binary_learner <- Lrnr_glmnet$new( alpha = 1, family = "binomial", nfolds = 3 ) fglm_binary_learner <- Lrnr_glm_fast$new(family = binomial()) binary_learner_lib <- Stack$new( enet_binary_learner, lasso_binary_learner, fglm_binary_learner, mean_learner ) logistic_metalearner <- make_learner(Lrnr_solnp, metalearner_logistic_binomial, loss_loglik_binomial) sl_binary_learner <- Lrnr_sl$new(learners = binary_learner_lib, metalearner = logistic_metalearner) # create list for treatment and outcome mechanism regressions learner_list <- list( Y = sl_contin_learner, A = sl_binary_learner )
The natural direct and indirect effects arise from a decomposition of the ATE: \begin{equation} \mathbb{E}[Y(1) - Y(0)] = \underbrace{\mathbb{E}[Y(1, Z(0)) - Y(0, Z(0))]}{NDE} + \underbrace{\mathbb{E}[Y(1, Z(1)) - Y(1, Z(0))]}{NIE}. \end{equation} In particular, the natural indirect effect (NIE) measures the effect of the treatment $A \in {0, 1}$ on the outcome $Y$ through the mediators $Z$, while the natural direct effect (NDE) measures the effect of the treatment on the outcome through all other paths.
The NDE is defined as \begin{equation} \Psi_{NDE}=\mathbb{E}[Y(1, Z(0)) - Y(0, Z(0))] \overset{\text{rand.}}{=} \sum_w \sum_z [\underbrace{\mathbb{E}(Y \mid A = 1, z, w)}{\bar{Q}_Y(A = 1, z, w)} - \underbrace{\mathbb{E}(Y \mid A = 0, z, w)}{\bar{Q}Y(A = 0, z, w)}] \times \underbrace{p(z \mid A = 0, w)}{Q_Z(0, w))} \underbrace{p(w)}_{Q_W}, \end{equation} where the likelihood factors $p(z \mid A = 0, w)$ and $p(w)$ (among other conditional densities) arise from a factorization of the joint likelihood: \begin{equation} p(w, a, z, y) = \underbrace{p(y \mid w, a, z)}{Q_Y(A, W, Z)} \underbrace{p(z \mid w, a)}{Q_Z(Z \mid A, W)} \underbrace{p(a \mid w)}{g(A \mid W)} \underbrace{p(w)}{Q_W}. \end{equation}
The process of estimating the NDE begins by constructing $\bar{Q}{Y, n}$, an estimate of the outcome mechanism $\bar{Q}_Y(Z, A, W) = \mathbb{E}[Y \mid Z, A, W]$ (i.e., the conditional mean of $Y$, given $Z$, $A$, and $W$). With an estimate of this conditional expectation in hand, predictions of the counterfactual quantities $\bar{Q}_Y(Z, 1, W)$ (setting $A = 1$) and, likewise, $\bar{Q}_Y(Z, 0, W)$ (setting $A = 0$) can easily be obtained. We denote the difference of these counterfactual quantities $\bar{Q}{\text{diff}}$, i.e., $\bar{Q}{\text{diff}} = \bar{Q}_Y(Z, 1, W) - \bar{Q}_Y(Z, 0, W)$. $\bar{Q}{\text{diff}}$ represents the difference in $Y$ attributable to changes in $A$ while keeping $Z$ and $W$ at their natural (i.e. observed) values.
The estimation procedure treats $\bar{Q}{\text{diff}}$ itself as a nuisance parameter, regressing its estimate $\bar{Q}{\text{diff}, n}$ on $W$, among control observations only (i.e., those for whom $A = 0$ is observed); the goal of this step is to remove part of the marginal impact of $Z$ on $\bar{Q}{\text{diff}}$, since $W$ is a parent of $Z$. Regressing this difference on $W$ among the controls recovers the expected $\bar{Q}{\text{diff}}$, had all individuals been set to the control condition $A = 0$. Any residual additive effect of $Z$ on $\bar{Q}_{\text{diff}}$ is removed during the TML estimation step using the auxiliary (or "clever") covariate, which accounts for the mediators $Z$. This auxiliary covariate take the form \begin{equation} C_Y(Q_Z, g)(O) = \Bigg{\frac{\mathbb{I}(A = 1)}{g(1 \mid W)} \frac{Q_Z(Z \mid 0, W)}{Q_Z(Z \mid 1, W)} - \frac{\mathbb{I}(A = 0)}{g(0 \mid W)} \Bigg}. \end{equation} Breaking this down, $\frac{\mathbb{I}(A = 1)}{g(1 \mid W)}$ is the inverse probability weight for $A = 1$ and, likewise, $\frac{\mathbb{I}(A = 0)}{g(0 \mid W)}$ is the inverse probability weight for $A = 0$. The middle term is the ratio of the mediator density when $A = 0$ to the mediator density when $A = 1$.
Thus, it would appear that an estimate of this conditional density is required;
unfortunately, tools to estimate such quantities are sparse in the statistics
literature, and the problem is still more complicated (and computationally
taxing) when $Z$ is high-dimensional. As only the ratio of these conditional
densities is required, a convenient re-parametrization may be achieved, that is,
\begin{equation}
\frac{p(A = 0 \mid Z, W) g(0 \mid W)}{p(A = 1 \mid Z, W) g(1 \mid W)}.
\end{equation}
Going forward, we will denote this re-parameterized conditional probability
$e(A \mid Z, W) := p(A \mid Z, W)$. This is particularly useful since the
problem is reduced to the estimation of conditional means, opening the door to
the use of a wide range of machine learning algorithms (e.g., most of those in
sl3
).
Underneath the hood, the counterfactual outcome difference $\bar{Q}{\text{diff}}$ and $e(A \mid Z, W)$, the conditional probability of $A$ given $Z$ and $W$, are used in constructing the auxiliary covariate for TML estimation. These nuisance parameters play an important role in the bias-correcting _TMLE-update step.
Derivation and estimation of the NIE is analogous to that of the NDE. The NIE is the effect of $A$ on $Y$ only through the mediator(s) $Z$. This quantity -- known as the (additive) natural indirect effect $\mathbb{E}(Y(Z(1), 1) - \mathbb{{E}(Y(Z(0), 1)$ -- corresponds to the difference of the conditional expectation of $Y$ given $A = 1$ and $Z(1)$ (the values the mediator would take under $A = 1$) and the conditional expectation of $Y$ given $A = 1$ and $Z(0)$ (the values the mediator would take under $A = 0$).
As with the NDE, the re-parameterization trick can be used to estimate $\mathbb{E}(A \mid Z, W)$, avoiding estimation of a possibly multivariate conditional density. However, in this case, the mediated mean outcome difference, denoted $\Psi_Z(Q)$, is instead estimated as follows \begin{equation} \Psi_{NIE}(Q) = \mathbb{E}{QZ}(\Psi{NIE, Z}(Q)(1, W) - \Psi_{NIE, Z}(Q)(0, W)) \end{equation}
Phrased plainly, this uses $\bar{Q}Y(Z, 1, W)$ (the predicted values for $Y$ given $Z$ and $W$ when $A = 1$) and regresses this vector on $W$ among the treated units (for whom $A = 1$ is observed) to obtain the conditional mean $\Psi{NIE, Z}(Q)(1, W)$. Performing the same procedure, but now regressing $\bar{Q}Y(Z, 1, W)$ on $W$ among the control units (for whom $A = 0$ is observed) yields $\Psi{NIE,Z}(Q)(0, W)$. The difference of these two estimates is the NIE and can be thought of as the additive marginal effect of treatment on the conditional expectation of $Y$ given $W$, $A = 1$, $Z$ through its effects on $Z$. So, in the case of the NIE, our estimate $\psi_n$ is slightly different, but the same quantity $e(A \mid Z, W)$ comes into play as the auxiliary covariate.
We demonstrate calculation of the NIE using tmle3mediate
below, starting by
instantiating a "Spec" object that encodes exactly which learners to use for the
nuisance parameters $e(A \mid Z, W)$ and $\Psi_Z$. We then pass our Spec object
to the tmle3
function, alongside the data, the node list (created above), and
a learner list indicating which machine learning algorithms to use for
estimating the nuisance parameters based on $A$ and $Y$.
tmle_spec_NIE <- tmle_NIE( e_learners = Lrnr_cv$new(lasso_binary_learner, full_fit = TRUE), psi_Z_learners = Lrnr_cv$new(lasso_contin_learner, full_fit = TRUE), max_iter = 1 ) weight_behavior_NIE <- tmle3( tmle_spec_NIE, weight_behavior_complete, node_list, learner_list ) weight_behavior_NIE
An analogous procedure applies for estimation of the NDE, only replacing the
Spec object for the NIE with tmle_spec_NDE
to define learners for the NDE
nuisance parameters:
tmle_spec_NDE <- tmle_NDE( e_learners = Lrnr_cv$new(lasso_binary_learner, full_fit = TRUE), psi_Z_learners = Lrnr_cv$new(lasso_contin_learner, full_fit = TRUE), max_iter = 1 ) weight_behavior_NDE <- tmle3( tmle_spec_NDE, weight_behavior_complete, node_list, learner_list ) weight_behavior_NDE
At times, the natural direct and indirect effects may prove too limiting, as these effect definitions are based on static interventions (i.e., setting $A = 0$ or $A = 1$), which may be unrealistic for real-world interventions. In such cases, one may turn instead to the population intervention direct effect (PIDE) and the population intervention indirect effect (PIIE), which are based on decomposing the effect of the population intervention effect (PIE) of flexible stochastic interventions [@diaz2020causal].
A particular type of stochastic intervention well-suited to working with binary
treatments is the incremental propensity score intervention (IPSI), first
proposed by @kennedy2017nonparametric. Such interventions do not
deterministically set the treatment level of an observed unit to a fixed
quantity (i.e., setting $A = 1$), but instead alter the odds of receiving the
treatment by a fixed amount ($0 \leq \delta \leq \infty$) for each individual.
In the case of the mma
dataset, we will proceed by considering an IPSI that
modulates the odds of participating in a sports team by $\delta = 2$. Such an
intervention may be interpreted (hypothetically) as the effect of a school
program that motivates children to participate in sports teams (e.g., as in an
encouragement design). To exemplify our approach, we postulate a motivational
intervention that doubles the odds (i.e., $\delta = 2$) of participating in a
sports team for each individual:
delta_ipsi <- 2
We may decompose the population intervention effect (PIE) in terms of the population intervention direct effect (PIDE) and the population intervention indirect effect (PIIE): \begin{equation} \mathbb{E}{Y(A_\delta)} - \mathbb{E}Y = \overbrace{\mathbb{E}{Y(A_\delta, Z(A_\delta)) - Y(A_\delta, Z)}}^{\text{PIIE}} + \overbrace{\mathbb{E}{Y(A_\delta, Z) - Y(A, Z)}}^{\text{PIDE}}. \end{equation}
This decomposition of the PIE as the sum of the population intervention direct and indirect effects has an interpretation analogous to the corresponding standard decomposition of the average treatment effect. In the sequel, we will compute each of the components of the direct and indirect effects above using appropriate estimators as follows
npcausal
R package.As described by @diaz2020causal, the statistical functional identifying the decomposition term that appears in both the PIDE and PIIE $\mathbb{E}{Y(A_{\delta}, Z)}$, which corresponds to altering the treatment mechanism while keeping the mediation mechanism fixed, is \begin{equation} \theta_0(\delta) = \int m_0(a, z, w) g_{0,\delta}(a \mid w) p_0(z, w) d\nu(a, z, w), \end{equation} for which a TML estimator is available. The corresponding efficient influence function (EIF) with respect to the nonparametric model $\mathcal{M}$ is $D_{\eta,\delta}(o) = D^Y_{\eta,\delta}(o) + D^A_{\eta,\delta}(o) + D^{Z,W}_{\eta,\delta}(o) - \theta(\delta)$.
The TML estimator may be computed basd on the EIF estimating equation and may
incorporate cross-validation [@zheng2011cross; @chernozhukov2018double] to
circumvent possibly restrictive entropy conditions (e.g., Donsker class). The
resultant estimator is
\begin{equation}
\hat{\theta}(\delta) = \frac{1}{n} \sum_{i = 1}^n D_{\hat{\eta}{j(i)},
\delta}(O_i) = \frac{1}{n} \sum{i = 1}^n \left{ D^Y_{\hat{\eta}{j(i)},
\delta}(O_i) + D^A{\hat{\eta}{j(i)}, \delta}(O_i) +
D^{Z,W}{\hat{\eta}_{j(i)}, \delta}(O_i) \right},
\end{equation}
which is implemented in tmle3mediate
(a one-step estimator is also avaialble,
in the medshift
R package). We
demonstrate the use of tmle3mediate
to obtain $\mathbb{E}{Y(A_{\delta}, Z)}$
via its TML estimator:
# instantiate tmle3 spec for stochastic mediation tmle_spec_pie_decomp <- tmle_medshift( delta = delta_ipsi, e_learners = Lrnr_cv$new(lasso_binary_learner, full_fit = TRUE), phi_learners = Lrnr_cv$new(lasso_contin_learner, full_fit = TRUE) ) # compute the TML estimate weight_behavior_pie_decomp <- tmle3( tmle_spec_pie_decomp, weight_behavior_complete, node_list, learner_list ) weight_behavior_pie_decomp
Recall that, based on the decomposition outlined previously, the population intervention direct effect may be denoted $\beta_{\text{PIDE}}(\delta) = \theta_0(\delta) - \mathbb{E}Y$. Thus, an estimator of the PIDE, $\hat{\beta}_{\text{PIDE}}(\delta)$ may be expressed as a composition of estimators of its constituent parameters: \begin{equation} \hat{\beta}{\text{PIDE}}({\delta}) = \hat{\theta}(\delta) - \frac{1}{n} \sum{i = 1}^n Y_i. \end{equation}
Based on the above, we may construct an estimator of the PIDE using quantities
already computed. To do this, we need only apply the delta method, available
from the tmle3
package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.