knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(digits = 4) library(MRTAnalysis)
The MRTAnalysis
package now supports analysis of mediated causal excursion effect of time-varying treatments, time-varying mediators, and a continuous distal outcome in micro-randomized trials (MRTs), using the function mcee()
.
Distal outcomes are measured once at the end of the study (e.g., weight loss, cognitive score), in contrast to proximal outcomes which are repeatedly measured after each treatment decision point. Time-varying mediators can often be the proximal outcomes in MRTs.
Micro-randomized trials (MRTs) are experimental designs to evaluate and optimize quickly-adapting interventions, where each participant is randomized among treatment options many times throughout the study. This vignette shows how to use the mcee
(stands for "mediated causal excursion effect") function to assess how treatment effects on a distal outcome are mediated through intermediate variables (e.g., the activity prompt -> near-term activity -> long-term cardiovascular health pathway).
This package implements estimation of the natural direct excursion effect (NDEE) and the natural indirect excursion effect (NIEE) in MRTs and related longitudinal designs:
The modeling of the direct and indirect effects is allowed to vary with the decision point index, specified through the argument time_varying_effect_form
. For example, one may model NIEE and NDEE as constants over time (using ~1
) or as quadratic functions of the decision point (using ~ dp + I(dp^2)
). The parameter that parameterizes NIEE is denoted by $\beta$ and the one for NDEE is $\alpha$. At present, the software does not support mediation effects conditional on baseline or other time-varying covariates. For full details of the estimands, see @qian2025dynamic.
For applied users: if analyzing an MRT, you typically know the randomization probability, and need to provide a control_formula_with_mediator
describing covariates and mediators to include in nuisance models. By default, all nuisance regressions are estimated with GLMs, but you can substitute other machine learning methods.
For statistically inclined readers: the estimator is multiply robust and generalizes the semiparametric estimators of Tchetgen Tchetgen & Shpitser (2012) to longitudinal settings. It's a two-stage estimation process, where the first stage fits five nuisance functions and the second stage estimates the parameterized NIEE and NDEE. The five nuisance functions are (see @qian2025dynamic for the precise definition):
The package provides three entry points depending on how much control you want:
mcee
: streamlined interface for MRTs with known randomization probabilities. Recommended default for most MRT analyses. mcee_general
: more flexible, with explicit configuration of nuisance models. Useful for observational studies or MRTs when experimenting with different learners. mcee_userfit_nuisance
: accepts user-supplied nuisance predictions (from external ML fits). The remainder of this vignette will first focus on mcee
for a quick introduction, then show how to use the more flexible wrappers.
You can install the MRTAnalysis package from CRAN:
# install.packages("MRTAnalysis")
We illustrate the most basic usage of mcee
on a small simulated dataset. Note that the dataset is in long format, one row per subject-by-decision point. The distal outcome is repeated on each row as a constant within subject.
set.seed(123) # Simulate a toy dataset n <- 20 T_val <- 5 id <- rep(1:n, each = T_val) dp <- rep(1:T_val, times = n) A <- rbinom(n * T_val, 1, 0.5) M <- rbinom(n * T_val, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.7 * M + 0.2 * dp + rnorm(n * T_val), id) # constant within id dat <- data.frame(id, dp, A, M, Y) # Minimal mcee call fit <- mcee( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", time_varying_effect_form = ~1, # constant-over-time NDEE and NIEE control_formula_with_mediator = ~ dp + M, # nuisance adjustment control_reg_method = "glm", # default method rand_prob = 0.5, # known randomization prob verbose = FALSE ) summary(fit)
The summary
output provides estimates of Natural Direct Excursion Effect (NDEE; $\alpha$) and Natural Indirect Excursion Effect (NIEE; $\beta$), with standard errors, 95\% confidence intervals, and p-values.
The only row in the output Natural Direct Excursion Effect (alpha)
and Natural Indirect Excursion Effect (beta)
is named Intercept
, indicating that these effects are modeled as constants over time (like intercepts in the effect models). In particular, the estimated NDEE is 0.17 (95\% CI: -0.08, 0.42; p-value: 0.17) and the estimated NIEE is 0.025 (95\% CI: -0.002, 0.054; p-value: 0.06). The confidence intervals and p-values are based on t-quantiles.
This package ships a small example dataset, data_time_varying_mediator_distal_outcome
. It is in long format, one row per subject-by-decision point.
ID
: subject identifier (use the actual column name present in your data)t_val
: decision point index (must be strictly increasing within each subject)I
: availability (binary in {0,1}
); if omitted, the analysis assumes all rows are available (which is not the case for this data example)p_A
: randomization probability, i.e., probability of $A_{it} = 1$ at a given decision point conditional on the history informationA
: binary treatment, coded in {0,1}
M
: mediator (continuous or binary are both allowed)Y
: distal outcome — constant within each subject (the same value repeated across the subject's rows)X
, A_prev
, M_prev
, etc.)data(data_time_varying_mediator_distal_outcome) dat <- data_time_varying_mediator_distal_outcome dplyr::glimpse(dat) dplyr::count(dat, id, name = "Ti") |> dplyr::summarise(mean_T = mean(Ti), min_T = min(Ti), max_T = max(Ti)) # Delete some decision points for certain individuals to mimic scenarios # where not all individuals have the same number of decision points. dat <- dat[!((dat$id == 1 & dat$dp == 10) | (dat$id == 2 & dat$dp %in% c(9, 10))), ] dplyr::count(dat, id, name = "Ti") |> dplyr::summarise(mean_T = mean(Ti), min_T = min(Ti), max_T = max(Ti))
mcee
We focus first on the streamlined MRT workflow with known randomization probability. Throughout, we'll use the included dataset.
set.seed(1) fit1 <- mcee( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1, # NDEE and NIEE are constant over time control_formula_with_mediator = ~ dp + M + X, # covariate adjustment control_reg_method = "glm", # nuisance learners for q, eta, mu, nu verbose = FALSE ) summary(fit1)
time_varying_effect_form
: defines the basis f(t) used to model alpha (direct) and beta (indirect).control_formula_with_mediator
: adjustment set for nuisance models; include the mediator here (the wrapper will automatically remove it from the nuisance function models that should not include it). This can include s()
terms if control_reg_method = "gam"
.dp
)fit2 <- mcee( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, # NDEE, NIEE vary linearly in time control_formula_with_mediator = ~ dp + M + X, control_reg_method = "glm", verbose = FALSE ) summary(fit2) # rows now labeled (Intercept) and dp
Interpretation: the rows for dp
report how the natural direct/indirect excursion effects change per unit increase in decision point.
Tip. If your dataset already contains a basis of time (e.g., spline columns), you can reference them in
time_varying_effect_form
. The package will warn when variables other thandp
appear there, but this is allowed for precomputed time bases.
You can switch the learner used for fitting the nuisance functions via control_reg_method
:
"gam"
"rf"
"ranger"
"sl"
# Example: GAM (generalized additive model) set.seed(2) fit3 <- mcee( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, control_formula_with_mediator = ~ s(dp) + s(M) + s(X), # spline formula for mgcv::gam control_reg_method = "gam", verbose = FALSE ) summary(fit3)
If interested in estimating direct and indirect effects of intervention at a specific decision point (or a few decision points), you can use the specific_dp_only
argument. For example, to estimate effects at the first two decision points (1 and 2) only, you can either:
specific_dp_only = c(1, 2)
in mcee
, orweight_per_row
that puts nonzero mass only at those rows (the wrappers normalize weights within subject).fit4 <- mcee( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~1, control_formula_with_mediator = ~ dp + M + X, control_reg_method = "glm", specific_dp_only = c(1, 2), verbose = FALSE ) summary(fit4)
The effect estimates are now an average over decision points 1 and 2.
In the next section of the vignette, we'll show how to use the more flexible wrappers, mcee_general
and mcee_userfit_nuisance
.
Every wrapper in this package returns an object of class mcee_fit
. This object is a list with several components.
$mcee_fit
alpha_hat
, beta_hat
: estimated coefficients for the natural direct excursion effect ($\alpha$) and natural indirect excursion effect ($\beta$).alpha_se
, beta_se
: standard errors.varcov
: estimated covariance matrix for both $\alpha$ and $\beta$ coefficients (ordered as $(\alpha^T, \beta^T)^T$).alpha_varcov
, beta_varcov
: estimated covariance matrices for $\alpha$ and $\beta$ separately.fit1$mcee_fit
The lincomb_joint
, lincomb_alpha
, and lincomb_beta
arguments to summary()
lets you compute linear combinations of the estimated coefficients. For example:
fit1
)# difference between direct and indirect excursion effects summary(fit1, lincomb_joint = c(1, -1))
This requests the linear combination $\alpha - \beta$, with standard error and t-statistic computed from the covariance matrix.
fit2
)Suppose you fit a model with a linear time basis, e.g.:
fit2 <- mcee( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", rand_prob = "p_A", time_varying_effect_form = ~dp, control_formula_with_mediator = ~ dp + M + X, control_reg_method = "glm", verbose = FALSE )
To get the effect at decision point 10 (the last decision point), you need the intercept plus 9 times the slope:
summary(fit2, lincomb_alpha = c(1, 9), lincomb_beta = c(1, 9))
If you prefer a joint contrast (e.g., alpha(t=10) − beta(t=10)), stack the two contrasts into a single row over c(alpha, beta)
:
L_joint_t10 <- matrix(c( 1, 9, # alpha part -1, -9 # beta part ), nrow = 1) summary(fit2, lincomb_joint = L_joint_t10)
For diagnostics, the returned object also includes:
$nuisance_models
: the actual model objects used for fitting each nuisance parameter (p
, q
, eta
, mu
, nu
).$nuisance_fitted
: the fitted values for each nuisance function (p1
, q1
, eta1
, eta0
, mu1
, mu0
, nu1
, nu0
).One can use summary(..., show_nuisance = TRUE)
to get a compact summary of the nuisance fits.
summary(fit1, show_nuisance = TRUE) head(fit1$nuisance_fitted$mu1)
Tip. Because
varcov
is the joint covariance ofc(alpha, beta)
, you can build any custom Wald test by forming your own contrast matrixL
and usingL %*% c(alpha, beta)
with varianceL %*% varcov %*% t(L)
. This is equivalent to usinglincomb_joint = L
insummary()
.
mcee_general
and mcee_userfit_nuisance
While mcee
covers the common MRT use case (known randomization, one control formula for all nuisance functions), two additional wrappers provide more flexibility in controlling how the nuisance functions are fitted:
mcee_general
This interface lets you configure each nuisance function separately.
known
: supply a fixed vector/scalar (commonly for treatment mechanism p
in MRT).formula
: regression formula, which can include s()
terms if using method = "gam"
.method
: learner (glm
, gam
, lm
, rf
, ranger
, sl
).family
: optional; defaults inferred from nuisance type (binomial
for p
and q
;
gaussian
for eta
, mu
, nu
).Helper functions can be used to build these objects: mcee_config_maker
, mcee_config_known
, mcee_config_glm
, mcee_config_gam
, mcee_config_lm
, mcee_config_rf
, mcee_config_ranger
, mcee_config_sl
, mcee_config_sl_user
.
# Families (binomial vs Gaussian) are chosen automatically when omitted; for `p` and `q` the default is binomial, for the outcome regressions it is Gaussian. cfg <- list( p = mcee_config_known("p", dat$p_A), # known randomization prob in MRT q = mcee_config_glm("q", ~ dp + X + M), eta = mcee_config_glm("eta", ~ dp + X), mu = mcee_config_glm("mu", ~ dp + X + M), nu = mcee_config_glm("nu", ~ dp + X) ) fit_gen <- mcee_general( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", time_varying_effect_form = ~dp, config_p = cfg$p, config_q = cfg$q, config_eta = cfg$eta, config_mu = cfg$mu, config_nu = cfg$nu, verbose = FALSE ) summary(fit_gen)
Tip. This interface is particularly useful for observational studies where the treatment mechanism
p
is unknown and must be estimated. You can setconfig_p
to a regression formula and method of your choice.
mcee_userfit_nuisance
Sometimes you fit nuisance models externally, perhaps with custom ML workflows. In that case, you can bypass Stage‑1 nuisance model fitting and supply predictions directly.
p1
, q1
, eta1
, eta0
, mu1
, mu0
, nu1
, nu0
, each aligned with the rows of your data.p1
: $P(A_t = I_t | H_t)$q1
: $P(A_t = I_t | H_t, M_t)$eta1
: $E(Y | H_t, A_t = I_t)$eta0
: $E(Y | H_t, A_t = 0)$mu1
: $E(Y | H_t, A_t = I_t, M_t)$mu0
: $E(Y | H_t, A_t = 0, M_t)$nu1
: $E[E(Y | H_t, A_t = I_t, M_t) | H_t, A_t = 0]$nu0
: $E[E(Y | H_t, A_t = 0, M_t) | H_t, A_t = I_t]$p1 = 1
and q1 = 1
at I == 0
.# Fit nuisance regressions manually: p, q, eta, mu, nu p1_hat <- dat$p_A # known randomization prob in MRT p1_hat[dat$I == 0] <- 1 # manually set this to avoid wrapper message q1_hat <- predict(glm(A ~ dp + X + M, family = binomial(), data = dat[dat$I == 1, ]), type = "response", newdata = dat ) q1_hat[dat$I == 0] <- 1 # manually set this to avoid wrapper message eta1_hat <- predict(lm(Y ~ dp + X, data = dat[dat$A == dat$I, ]), newdata = dat) eta0_hat <- predict(lm(Y ~ dp + X, data = dat[dat$A == 0, ]), newdata = dat) mu1_hat <- predict(lm(Y ~ dp + X + M, data = dat[dat$A == dat$I, ]), newdata = dat) mu0_hat <- predict(lm(Y ~ dp + X + M, data = dat[dat$A == 0, ]), newdata = dat) nu1_hat <- predict(lm(mu1 ~ dp + X, data = cbind(dat, mu1 = mu1_hat)[dat$A == 0, ]), newdata = dat) nu0_hat <- predict(lm(mu0 ~ dp + X, data = cbind(dat, mu0 = mu0_hat)[dat$A == dat$I, ]), newdata = dat) fit_usr <- mcee_userfit_nuisance( data = dat, id = "id", dp = "dp", outcome = "Y", treatment = "A", mediator = "M", availability = "I", time_varying_effect_form = ~dp, p1 = p1_hat, q1 = q1_hat, eta1 = eta1_hat, eta0 = eta0_hat, mu1 = mu1_hat, mu0 = mu0_hat, nu1 = nu1_hat, nu0 = nu0_hat, verbose = FALSE ) summary(fit_usr)
mcee
.p1
as a known constant vector.Different wrappers will produce the exact same result if the same nuisance models are used. Here we compare fit2
(from mcee
) and fit_gen
(from mcee_general
) which use the same GLM nuisance models and the same control variables.
fit2$mcee_fit[c("alpha_hat", "beta_hat", "alpha_se", "beta_se")] fit_gen$mcee_fit[c("alpha_hat", "beta_hat", "alpha_se", "beta_se")] all.equal(fit2$mcee_fit, fit_gen$mcee_fit, tolerance = 1e-6)
all.equal(fit2$mcee_fit, fit_usr$mcee_fit, tolerance = 1e-6)
mcee
. For analyzing MRTs with known randomization probabilities, this is the simplest choice.dp
in time_varying_effect_form
(or simply set it to ~1
). Do not include baseline or time-varying covariates here. If you've precomputed spline or polynomial bases of dp
, referencing those columns is allowed.control_formula_with_mediator
(or in config_mu
/config_q
for mcee_general
).dp
. Outcomes must be constant within subjects. No missing values are allowed.The mcee
package provides a multiply robust estimator of natural direct and indirect excursion effects in micro-randomized trials and observational longitudinal studies with time-varying treatments, time-varying mediators and a distal outcome. Three wrapper functions serve different needs:
mcee
: the streamlined MRT workflow with known randomization.mcee_general
: full control over nuisance model specifications.mcee_userfit_nuisance
: for injecting externally fitted nuisance parameters.Together, these tools allow both applied researchers and methodologists to estimate mediated excursion effects with transparent assumptions and flexible modeling options.
This appendix documents how the core calculation happens in .mcee_core_rows()
, which is a detailed derivation of Eq. (12) and the asymptotic variance of Theorem 4 in @qian2025dynamic.
The proposed estimating function is
[ \psi(\gamma; \zeta) := \sum_{t=1}^T \omega(t) \begin{bmatrix} { \phi_t^{10}(p_t, q_t, \mu_t, \nu_t) - \phi_t^{00}(p_t, \eta_t) - f(t)^\top \alpha } f(t) \ { \phi_t^{11}(p_t, \eta_t) - \phi_t^{10}(p_t, q_t, \mu_t, \nu_t) - f(t)^\top \beta } f(t) \end{bmatrix}. ]
The estimating equation (EE), omitting the nuisance parameters and allowing different (T_i) for each subject (i), is
[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} { \phi_{it}^{10} - \phi_{it}^{00} - f(t)^\top \alpha } f(t) \ { \phi_{it}^{11} - \phi_{it}^{10} - f(t)^\top \beta } f(t) \end{bmatrix} = 0. ]
The estimators are computed by
[ \hat\alpha = \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) f(t)f(t)^\top \right]^{-1} \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) { \phi_{it}^{10} - \phi_{it}^{00} } f(t) \right], ]
[ \hat\beta = \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) f(t)f(t)^\top \right]^{-1} \left[ \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) { \phi_{it}^{11} - \phi_{it}^{10} } f(t) \right]. ]
The asymptotic variance of ((\alpha^\top, \beta^\top)^\top) can be estimated by
[ \text{Var}(\hat\gamma) \approx \text{Bread}^{-1} \, \text{Meat} \, (\text{Bread}^{-1})^\top, ]
where
[ \text{Bread} = \frac{1}{n}\sum_{i=1}^n \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} f(t)f(t)^\top & 0 \ 0 & f(t)f(t)^\top \end{bmatrix}, ]
[ \text{Meat} = \frac{1}{n}\sum_{i=1}^n \left( \sum_{t=1}^{T_i} \omega(i,t) \begin{bmatrix} { \phi_{it}^{10} - \phi_{it}^{00} - f(t)^\top \alpha } f(t) \ { \phi_{it}^{11} - \phi_{it}^{10} - f(t)^\top \beta } f(t) \end{bmatrix} \right)^{\otimes 2}. ]
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.