Primary and Secondary Analysis for Micro-Randomized Trial (MRT): Proximal Outcomes
In MRTAnalysis: Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Introduction

The MRTAnalysis package provides user-friendly functions to conduct primary and secondary analyses for micro-randomized trial (MRT), where the proximal outcomes are continuous or binary and the intervention option is binary. For continuous proximal outcomes, the estimated causal effects are on the additive scale. For binary proximal outcomes, the estimated causal effects are on the log relative risk scale. In particular, this package can be used to

estimate the marginal proximal causal excursion effect
estimate the moderated proximal causal excursion effect, i.e., the effect modification by time-fixed or time-varying covariates.

MRT is an experimental design for optimizing mobile health interventions. The marginal and the moderated causal excursion effects are the quantities of interest in primary and secondary analyses for MRT. In this tutorial we briefly review MRT and proximal causal excursion effects, and illustrate the use of the estimators implemented in this package for conducting primary and secondary analyses of MRT: wcls() (weighted and centered least squares, for MRT with continuous proximal outcomes) and emee() (estimator for marginal excursion effect, for MRT with binary proximal outcomes).

In a separate tutorial, we will introduce the dcee() function for estimating the distal causal excursion effect in MRT, i.e., the effect of the time-varying treatments on a distal outcome, measured at the end of the study.

Data Structure of an MRT

In an MRT, each participant is repeatedly randomized among treatment options many times throughout the trial. Suppose there are $n$ participants, and for the $i$-th participant, they are enrolled in the trial for $m_i$ decision points. (In many MRT, the number of decision points $m_i$ is the same for all participants. This package also automatically handles the setting where $m_i$ is different for different participants, so we present the data structure in a more general way.)

For the $i$-th participant at the $t$-th decision point, we use the triplet $(X_{it}, A_{it}, Y_{it})$ to denote the data collected, where

$X_{it}$ denotes the (vector of) time-varying covariates.
$A_{it}$ denotes the binary treatment assignment; $A_{it} = 1$ if randomized to intervention, $A_{it} = 0$ if randomized to no intervention.
$A_{it}$ is randomized with success probability $p_{it}$. For many MRTs the $p_{it}$ is a constant throughout the trial for all participants (e.g., $p_{it} = 0.6$), but for some MRTs $p_{it}$ may depend on the past observations of the participant and thus is different for different $(i,t)$ combinations. This packages handles both situations.
$Y_{it}$ denotes the proximal outcome (continuous or binary) following $A_{it}$.
In some literature, the proximal outcome following $A_{it}$ is written as $Y_{i,t+1}$. We use $Y_{it}$ here because this aligns with the data.frame input: each row in the data.frame would correspond to $(X_{it}, A_{it}, Y_{it})$ (and possibly other variables --- see below) for a specific $(i,t)$ combination.

Availability

An MRT may include availability considerations. When it is inappropriate or unethical to deliver interventions to an individual, that individual is considered "unavailable", and no intervention will be delivered at that decision point so $A_{it} = 0$.

Mathematically, we use $I_{it}$ denotes the availability status of participant $i$ at decision point $t$: $I_{it} = 1$ if available and $I_{it} = 0$ if unavailable. Temporal-wise, availability $I_{it}$ is determined before $A_{it}$, and one can conceptualize it by considering $I_{it}$ to be measured at the same time as $X_{it}$.

Prepare Your Data Set for Analysis

To use any of the estimators in this package, you need to prepare your data set as a data.frame in long format, meaning that each row records observations from a decision point of a participant (i.e., $(X_{it}, A_{it}, Y_{it})$ (and possibly other variables --- see below) for a specific $(i,t)$ combination). The data.frame should be sorted so that consecutive rows should be from adjacent decision points from the same participant. Furthermore, the data set should contain the following columns:

A user id column that distinguishes different participants.
An outcome column that contains the proximal outcome.
A treatment assignment column that contains the binary treatment assignment $A_{it}$ (0 or 1, 1 being the active treatment such as sending a push notification and 0 being the control option such as not sending a push notification).
Columns that record baseline and time-varying covariates that will be used as control variables and/or moderators in the analysis.

The data set may also contain the following columns, depending on your MRT:

If in your MRT the randomization probability is not a constant throughout, your data set should include a randomization probability column that contains the randomization probability $p_{it}$.
If your MRT has availability considerations, your data set should include an availability column that contains the availability status $I_{it}$.
Optional: If in your MRT the randomization probability is not a constant throughout, you may also provide an optional column that contains the so-called numerator probability. A carefully constructed numerator probability column may reduce the standard error of the causal effect estimates. If you are not sure what this numerator probability is, feel free to ignore it. See @boruvka2018assessing and @qian2021estimating for details.

Causal Excursion Effect Estimation for MRT with Continuous Outcomes

library(MRTAnalysis)
current_options <- options(digits = 3) # save current options for restoring later

Fully Marginal Causal Excursion Effect

The following code uses wcls() to estimate the fully marginal causal excursion effect from a synthetic data set data_mimicHeartSteps that mimics the HeartSteps V1 MRT (@klasnja2015). This data set contains observations for 37 individuals at 210 different time points. The data set contains the following variables:

userid: id number of an individual.
time: index of decision point.
day_in_study: day in the study.
logstep_30min: the proximal outcome, i.e., the step count in the 30 minutes following the current decision point (log-transformed).
logstep_30min_lag1: the proximal outcome at the previous decision point (lag-1 outcome), i.e., the step count in the 30 minutes following the previous decision point (log-transformed).
logstep_pre30min: the step count in the 30 minutes prior to the current decision point (log-transformed).
is_at_home_or_work: whether the individual is at home or work (=1) or at other locations (=0) at the current decision point.
intervention: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point; the randomization probability is a constant 0.6 for this data set, mimicking HeartSteps V1.
avail: whether the individual is available (=1) or not (=0) at the current decision point.

A summary of data_mimicHeartSteps is as follows:

head(data_mimicHeartSteps, 10)

In the following function call of wcls(), we specify the proximal outcome variable by outcome = "logstep_30min". We specify the treatment variable by treatment = "intervention". We specify the constant randomization probability by rand_prob = 0.6. We specify the fully marginal effect as the quantity to be estimated by setting moderator_formula = ~1. We use logstep_pre30min as a control variable by setting control_formula = ~logstep_pre30min. We specify the availability variable by availability = avail.

fit1 <- wcls(
    data = data_mimicHeartSteps,
    id = "userid",
    outcome = "logstep_30min",
    treatment = "intervention",
    rand_prob = 0.6,
    moderator_formula = ~1,
    control_formula = ~logstep_pre30min,
    availability = "avail"
)
summary(fit1)

The summary() function provides the estimated causal excursion effect as well as the 95% confidence interval, standard error, and p-value. The only row in the output $causal_excursion_effect is named (Intercept), indicating that this is the fully marginal effect (like an intercept in the causal effect model). In particular, the estimated marginal excursion effect is 0.157, with 95% confidence interval (0.031, 0.284), and p-value 0.016. The confidence interval and the p-value are based on a small sample correction technique that is based on Hotelling's T distribution, so the Hotelling's T statistic (Hotelling) and the degrees of freedom (df1 and df2) are also printed. See @boruvka2018assessing for details on the small sample correction.

One can include more control variables, e.g., by setting control_formula = ~logstep_pre30min + logstep_30min_lag1 + is_at_home_or_work. This is illustrated by the following code. Different choices of control variables should lead to similar causal effect estimates, but better control variables (i.e., those that are correlated with the proximal outcome) usually decrease the standard error of the causal effect estimates. This is the case here: the standard error of the marginal causal excursion effect decreases slightly from 0.062 to 0.061 after we included two additional control variables.

fit2 <- wcls(
    data = data_mimicHeartSteps,
    id = "userid",
    outcome = "logstep_30min",
    treatment = "intervention",
    rand_prob = 0.6,
    moderator_formula = ~1,
    control_formula = ~ logstep_pre30min + logstep_30min_lag1 + is_at_home_or_work,
    availability = "avail"
)
summary(fit2)

One can ask summary() to print out the fitted coefficients for the control variables as well, by setting show_control_fit = TRUE. This is illustrated by the following code. However, we caution against interpreting the fitted coefficients for the control variables, because these coefficients are only interpretable when the control model is correctly specified, which can rarely be true in an MRT setting.

summary(fit2, show_control_fit = TRUE)

Moderated Causal Excursion Effect

The following code uses wcls() to estimate the causal excursion effect moderated by the location of the individual, is_at_home_or_work. This is achieved by setting moderator_formula = ~is_at_home_or_work.

fit3 <- wcls(
    data = data_mimicHeartSteps,
    id = "userid",
    outcome = "logstep_30min",
    treatment = "intervention",
    rand_prob = 0.6,
    moderator_formula = ~is_at_home_or_work,
    control_formula = ~ logstep_pre30min + logstep_30min_lag1 + is_at_home_or_work,
    availability = "avail"
)
summary(fit3)

The moderated causal excursion effect is modeled as a linear form: $\beta_0 + \beta_1 \cdot \text{is_at_home_or_work}$. The output $causal_excursion_effect contains two rows, one for $\beta_0$ (with row name (Intercept)) and the other for $\beta_1$ (with row name is_at_home_or_work). Here, $\beta_0$ is the causal excursion effect when the individual is not at home or work (estimate = 0.109, 95% CI = (-0.029, 0.247), p = 0.117), and $\beta_1$ is the difference in the causal excursion effects between when at home or work and when at other locations (estimate = 0.135, 95% CI = (-0.166, 0.435), p = 0.368).

One can also ask summary() to calculate and print the estimated coefficients for $\beta_0 + \beta_1$, the causal excursion effect when the individual is at home or work, by using the lincomb optional argument. This is illustrated by the following code. We set lincomb = c(1, 1), i.e., asks summary() to print out $[1, 1] \times (\beta_0, \beta_1)^T = \beta_0 + \beta_1$. The third row in $causal_excursion_effect, named (Intercept) + is_at_home_or_work, is the fitted result for this $\beta_0 + \beta_1$ coefficient combination.

summary(fit3, lincomb = c(1, 1))

Based on the output, the causal excursion effect at home or work is estimated to be 0.244, with 95% CI (0.085, 0.403) and p-value 0.002.

Causal Excursion Effect Estimation for MRT with Binary Outcomes

The syntax of emee() is exactly the same as wcls(). We illustrate the use of emee() below for completeness.

Fully Marginal Causal Excursion Effect

The following code uses emee() to estimate the fully marginal causal excursion effect from a synthetic data set data_binary. This data set contains observations for 100 individuals at 30 different time points. The data set contains the following variables:

userid: id number of an individual.
time: index of decision point.
time_var1: time-varying covariate 1 for illustration purpose. Here it is defined as the "standardized time in study", defined as the current decision point index divided by the total number of decision points.
time_var2: time-varying covariate 2 for illustration purpose. Here it is the indicator of "the second half of the study", defined as whether the current decision point index is greater than the total number of decision points divided by 2.
Y: the binary proximal outcome.
A: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point;
rand_prob: the randomization probability at each decision point, which is not a constant over time.
avail: whether the individual is available (=1) or not (=0) at the current decision point.

A summary of data_binary is as follows:

head(data_binary, 30)

In the following function call of emee(), we specify the proximal outcome variable by outcome = Y. We specify the treatment variable by treatment = A. We specify the randomization probability by rand_prob = rand_prob (the first rand_prob is an argument of emee(); the second rand_prob is a column in data_binary). We specify the fully marginal effect as the quantity to be estimated by setting moderator_formula = ~1. We use time_var1 and time_var2 as control variables by setting control_formula = ~ time_var1 + time_var2. We specify the availability variable by availability = avail.

fit4 <- emee(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail"
)
summary(fit4)

summary(fit4, show_control_fit = TRUE)

Moderated Causal Excursion Effect

The following code uses emee() to estimate the causal excursion effect moderated by time_var1. This is achieved by setting moderator_formula = ~time_var1.

fit5 <- emee(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~time_var1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail"
)
summary(fit5)

The moderated causal excursion effect is modeled as a linear form: $\beta_0 + \beta_1 \cdot \text{time_var1}$. The output $causal_excursion_effect contains two rows, one for $\beta_0$ (with row name (Intercept)) and the other for $\beta_1$ (with row name time_var1). Here, $\beta_0$ is the causal excursion effect when time_var1$=0$ (estimate = 0.081, 95% CI = (-0.180, 0.342), p = 0.54), and $\beta_1$ is the slope of time_var1 in the causal excursion effect model (estimate = 0.429, 95% CI = (0.051, 0.808), p = 0.03).

One can also ask summary() to calculate and print the linear combination of coefficients and their confidence interval, standard error, and p-value, by using the lincomb optional argument. The following code sets lincomb = rbind(c(1, 0.0333), c(1, 0.5), c(1, 1)), i.e., asks summary() to print out the estimates for $$ \begin{bmatrix}1 & 0.0333\ 1 & 0.5\ 1 & 1 \end{bmatrix}\times\begin{bmatrix}\beta_{0}\ \beta_{1} \end{bmatrix}=\begin{bmatrix}\beta_{0}+0.0333\beta_{1}\ \beta_{0}+0.5\beta_{1}\ \beta_{0}+\beta_{1} \end{bmatrix}. $$ Because $\beta_1$ is the slope of time_var1, which is a scaled version of decision time index that starts at 0.0333 and ends at 1, $\beta_0 + 0.0333\beta_1$, $\beta_0 + 0.5\beta_1$ and $\beta_0 + \beta_1$ are the causal excursion effects at the beginning of the study, mid-way during the study, and at the end of the study, respectively. The 3rd to 5th rows in $causal_excursion_effect show these results. Note that the interpretation is under the assumption that the causal excursion effect changes linearly over time.

summary(fit5, lincomb = rbind(c(1, 0.0333), c(1, 0.5), c(1, 1)))

options(current_options) # restore old options

Bibliography

Below are some references:

A review of MRT design, the causal excursion effect for continuous outcome, and the weighted and centered least squares (WCLS) method: @qian2022microrandomized
The original statistical paper on MRT with continuous outcome, which proposed the causal excursion effect and the WCLS method: @boruvka2018assessing
The original statistical paper on MRT with binary outcome, which proposed the causal excursion effect for binary outcome and the estimator for marginal excursion effect (EMEE) method: @qian2021estimating

Reference

Any scripts or data that you put into this service are public.

MRTAnalysis documentation built on Sept. 9, 2025, 5:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MRTAnalysis
Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials

Primary and Secondary Analysis for Micro-Randomized Trial (MRT): Proximal Outcomes
In MRTAnalysis: Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials

Introduction

Data Structure of an MRT

Availability

Prepare Your Data Set for Analysis

Causal Excursion Effect Estimation for MRT with Continuous Outcomes

Fully Marginal Causal Excursion Effect

Moderated Causal Excursion Effect

Causal Excursion Effect Estimation for MRT with Binary Outcomes

Fully Marginal Causal Excursion Effect

Moderated Causal Excursion Effect

Bibliography

Reference

Try the MRTAnalysis package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

MRTAnalysis Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials

Primary and Secondary Analysis for Micro-Randomized Trial (MRT): Proximal Outcomes In MRTAnalysis: Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials

Introduction

Data Structure of an MRT

Availability

Prepare Your Data Set for Analysis

Causal Excursion Effect Estimation for MRT with Continuous Outcomes

Fully Marginal Causal Excursion Effect

Moderated Causal Excursion Effect

Causal Excursion Effect Estimation for MRT with Binary Outcomes

Fully Marginal Causal Excursion Effect

Moderated Causal Excursion Effect

Bibliography

Reference

Try the MRTAnalysis package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

MRTAnalysis
Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials

Primary and Secondary Analysis for Micro-Randomized Trial (MRT): Proximal Outcomes
In MRTAnalysis: Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials