Exploratory Analysis for Micro-Randomized Trial (MRT): Continuous Distal Outcomes

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Introduction

The MRTAnalysis package now supports analysis of distal causal excursion effect of a continuous distal outcomes in micro-randomized trials (MRTs), using the function dcee().
Distal outcomes are measured once at the end of the study (e.g., weight loss, cognitive score), in contrast to proximal outcomes which are repeatedly measured after each treatment decision point.

This vignette introduces:

Data Structure of MRT with Distal Outcomes

In a distal-outcome MRT:

Thus, each row in the long-format data corresponds to $(X_{it}, A_{it}, I_{it}, p_{it})$, with $Y_i$ constant within each participant.

Distal Causal Excursion Effects

The distal causal excursion effects are defined using potential outcomes in @qian2025distal. Roughly speaking, the DCEE at decision point $t$ is the difference in the outcome $Y_i$ due to assigning treatment $A_{it}=1$ versus $A_{it}=0$ at time $t$, while keeping the past and future treatment assignments according to the randomization probabilities in the MRT (i.e., the MRT policy), and averaging over the covariate history and availability at $t$.

Example Dataset

This package provides data_distal_continuous, a synthetic dataset with:

library(MRTAnalysis)
current_options <- options(digits = 3) # save current options for restoring later
head(data_distal_continuous, 10)

Using dcee()

Fully Marginal Effect (no moderators)

In the following function call of dcee(), we specify the distal outcome variable by outcome = "Y". We specify the treatment variable by treatment = "A". We specify the time-varying randomization probability by rand_prob = "prob_A". We specify the fully marginal effect as the quantity to be estimated by setting moderator_formula = ~1. We use X and Z as two variables by setting control_formula = ~logstep_pre30min. We specify the availability variable by availability = avail. We use linear regression for the control regression model (i.e., the Stage-1 nuisance models in the two-stage estimation procedure in @qian2025distal) by setting control_reg_method = "lm".

Note that the estimator for the distal causal excursion effect is consistent even if the control regression model is mis-specified, as long as the treatment randomization probabilities are correctly specified (which will be the case for MRTs). Different control regression methods can be used to improve efficiency.

fit_lm <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1,
    control_formula = ~ X + Z,
    availability = "avail",
    control_reg_method = "lm"
)
summary(fit_lm)

The summary() function provides the estimated distal causal excursion effect as well as the 95% confidence interval, standard error, and p-value. The only row in the output Distal causal excursion effect (beta) is named Intercept, indicating that this is the fully marginal effect (like an intercept in the causal effect model). In particular, the estimated marginal distal excursion effect is 0.404, with 95% confidence interval (-0.771, 1.579), and p-value 0.49. The confidence interval and the p-value are based on t-quantiles.

Moderated Effect

The following code uses dcee() to estimate the distal causal excursion effect moderated by the time-varying covariate Z. This is achieved by setting moderator_formula = ~ Z.

fit_mod <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~Z,
    control_formula = ~ Z + X,
    availability = "avail",
    control_reg_method = "lm"
)
summary(fit_mod, lincomb = c(1, 1)) # beta0 + beta1

In the above, we asked summary() to calculate and print the estimated coefficients for $\beta_0 + \beta_1$, the distal causal excursion effect when the binary variable $Z$ takes value 1, by using the lincomb optional argument. This is illustrated by the following code. We set lincomb = c(1, 1), i.e., asks summary() to print out $[1, 1] \times (\beta_0, \beta_1)^T = \beta_0 + \beta_1$. The table under Linear combinations (L * beta) is the fitted result for this $\beta_0 + \beta_1$ coefficient combination.

GAM nuisance models

One can use generalized additive models (GAM) for the control regression models by setting control_reg_method = "gam". This may improve efficiency if the relationship between the distal outcome and the covariates is non-linear. One can use s() to specify non-linear terms in the control_formula. For example, here we use a smooth term for the continuous covariate X, by setting control_formula = ~ s(X) + Z.

fit_gam <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~Z,
    control_formula = ~ s(X) + Z,
    availability = "avail",
    control_reg_method = "gam"
)
summary(fit_gam)

Random Forest / Ranger nuisance

One can also use tree-based methods for the control regression models by setting control_reg_method = "rf" (random forest via randomForest package) or control_reg_method = "ranger" (faster random forest via ranger package). This may improve efficiency if the relationship between the distal outcome and the covariates is complex. Note that tree-based methods do not allow specification of smooth terms like s(X). The control_formula has to be specified using main terms only. Additional optional arguments can be passed to the underlying random forest function via ... argument of dcee(), which is not shown in this example.

fit_rf <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1,
    control_formula = ~ X + Z,
    availability = "avail",
    control_reg_method = "rf" # can replace "rf" with "ranger" for faster implementation
)
summary(fit_rf)

Cross-Fitting

The dcee() function also supports cross-fitting, which may lead to improved finite sample performance when using complex machine learning methods for the control regression models. This is done by setting cross_fit = TRUE and specifying the number of folds via cf_fold. Here we use 5-fold cross-fitting with generalized additive models for the control regression models as an example. The particular cross-fitting algorithm follows Section 4 in the Web Appendix of @zhong2021aipw.

fit_cf <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1,
    control_formula = ~ X + Z,
    availability = "avail",
    control_reg_method = "gam",
    cross_fit = TRUE, cf_fold = 5
)
summary(fit_cf)

Inspecting Stage-1 Fits

We can set show_control_fit = TRUE in the summary() function to inspect the control regression (i.e., Stage-1 nuisance) model fits. This is useful for diagnosing the fit of the control regression models. For lm/gam these include regression summaries. For tree-based or SuperLearner fits, original learner output is shown. To further inspect the control regression model fits, one can manually inspect $fit$regfit_a0 and $fit$regfit_a1.

summary(fit_lm, show_control_fit = TRUE)

References



Try the MRTAnalysis package in your browser

Any scripts or data that you put into this service are public.

MRTAnalysis documentation built on Sept. 9, 2025, 5:41 p.m.