knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The MRTAnalysis
package provides user-friendly functions to conduct primary and secondary analyses for micro-randomized trial (MRT), where the proximal outcomes are continuous or binary and the intervention option is binary. For continuous proximal outcomes, the estimated causal effects are on the additive scale. For binary proximal outcomes, the estimated causal effects are on the log relative risk scale. In particular, this package can be used to
MRT is an experimental design for optimizing mobile health interventions. The marginal and the moderated causal excursion effects are the quantities of interest in primary and secondary analyses for MRT. In this tutorial we briefly review MRT and proximal causal excursion effects, and illustrate the use of the estimators implemented in this package for conducting primary and secondary analyses of MRT: wcls()
(weighted and centered least squares, for MRT with continuous proximal outcomes) and emee()
(estimator for marginal excursion effect, for MRT with binary proximal outcomes).
In a separate tutorial, we will introduce the dcee()
function for estimating the distal causal excursion effect in MRT, i.e., the effect of the time-varying treatments on a distal outcome, measured at the end of the study.
In an MRT, each participant is repeatedly randomized among treatment options many times throughout the trial. Suppose there are $n$ participants, and for the $i$-th participant, they are enrolled in the trial for $m_i$ decision points. (In many MRT, the number of decision points $m_i$ is the same for all participants. This package also automatically handles the setting where $m_i$ is different for different participants, so we present the data structure in a more general way.)
For the $i$-th participant at the $t$-th decision point, we use the triplet $(X_{it}, A_{it}, Y_{it})$ to denote the data collected, where
data.frame
input: each row in the data.frame
would correspond to $(X_{it}, A_{it}, Y_{it})$ (and possibly other variables --- see below) for a specific $(i,t)$ combination.An MRT may include availability considerations. When it is inappropriate or unethical to deliver interventions to an individual, that individual is considered "unavailable", and no intervention will be delivered at that decision point so $A_{it} = 0$.
Mathematically, we use $I_{it}$ denotes the availability status of participant $i$ at decision point $t$: $I_{it} = 1$ if available and $I_{it} = 0$ if unavailable. Temporal-wise, availability $I_{it}$ is determined before $A_{it}$, and one can conceptualize it by considering $I_{it}$ to be measured at the same time as $X_{it}$.
To use any of the estimators in this package, you need to prepare your data set as a data.frame
in long format, meaning that each row records observations from a decision point of a participant (i.e., $(X_{it}, A_{it}, Y_{it})$ (and possibly other variables --- see below) for a specific $(i,t)$ combination). The data.frame
should be sorted so that consecutive rows should be from adjacent decision points from the same participant. Furthermore, the data set should contain the following columns:
The data set may also contain the following columns, depending on your MRT:
library(MRTAnalysis) current_options <- options(digits = 3) # save current options for restoring later
The following code uses wcls()
to estimate the fully marginal causal excursion effect from a synthetic data set data_mimicHeartSteps
that mimics the HeartSteps V1 MRT (@klasnja2015). This data set contains observations for 37 individuals at 210 different time points. The data set contains the following variables:
userid
: id number of an individual.time
: index of decision point.day_in_study
: day in the study.logstep_30min
: the proximal outcome, i.e., the step count in the 30 minutes following the current decision point (log-transformed).logstep_30min_lag1
: the proximal outcome at the previous decision point (lag-1 outcome), i.e., the step count in the 30 minutes following the previous decision point (log-transformed).logstep_pre30min
: the step count in the 30 minutes prior to the current decision point (log-transformed).is_at_home_or_work
: whether the individual is at home or work (=1) or at other locations (=0) at the current decision point.intervention
: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point; the randomization probability is a constant 0.6 for this data set, mimicking HeartSteps V1.avail
: whether the individual is available (=1) or not (=0) at the current decision point.A summary of data_mimicHeartSteps
is as follows:
head(data_mimicHeartSteps, 10)
In the following function call of wcls()
, we specify the proximal outcome variable by outcome = "logstep_30min"
. We specify the treatment variable by treatment = "intervention"
. We specify the constant randomization probability by rand_prob = 0.6
. We specify the fully marginal effect as the quantity to be estimated by setting moderator_formula = ~1
. We use logstep_pre30min
as a control variable by setting control_formula = ~logstep_pre30min
. We specify the availability variable by availability = avail
.
fit1 <- wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~1, control_formula = ~logstep_pre30min, availability = "avail" ) summary(fit1)
The summary()
function provides the estimated causal excursion effect as well as the 95% confidence interval, standard error, and p-value. The only row in the output $causal_excursion_effect
is named (Intercept)
, indicating that this is the fully marginal effect (like an intercept in the causal effect model). In particular, the estimated marginal excursion effect is 0.157, with 95% confidence interval (0.031, 0.284), and p-value 0.016. The confidence interval and the p-value are based on a small sample correction technique that is based on Hotelling's T distribution, so the Hotelling's T statistic (Hotelling
) and the degrees of freedom (df1
and df2
) are also printed. See @boruvka2018assessing for details on the small sample correction.
One can include more control variables, e.g., by setting control_formula = ~logstep_pre30min + logstep_30min_lag1 + is_at_home_or_work
. This is illustrated by the following code. Different choices of control variables should lead to similar causal effect estimates, but better control variables (i.e., those that are correlated with the proximal outcome) usually decrease the standard error of the causal effect estimates. This is the case here: the standard error of the marginal causal excursion effect decreases slightly from 0.062 to 0.061 after we included two additional control variables.
fit2 <- wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~1, control_formula = ~ logstep_pre30min + logstep_30min_lag1 + is_at_home_or_work, availability = "avail" ) summary(fit2)
One can ask summary()
to print out the fitted coefficients for the control variables as well, by setting show_control_fit = TRUE
. This is illustrated by the following code. However, we caution against interpreting the fitted coefficients for the control variables, because these coefficients are only interpretable when the control model is correctly specified, which can rarely be true in an MRT setting.
summary(fit2, show_control_fit = TRUE)
The following code uses wcls()
to estimate the causal excursion effect moderated by the location of the individual, is_at_home_or_work
. This is achieved by setting moderator_formula = ~is_at_home_or_work
.
fit3 <- wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~is_at_home_or_work, control_formula = ~ logstep_pre30min + logstep_30min_lag1 + is_at_home_or_work, availability = "avail" ) summary(fit3)
The moderated causal excursion effect is modeled as a linear form: $\beta_0 + \beta_1 \cdot \text{is_at_home_or_work}$. The output $causal_excursion_effect
contains two rows, one for $\beta_0$ (with row name (Intercept)
) and the other for $\beta_1$ (with row name is_at_home_or_work
). Here, $\beta_0$ is the causal excursion effect when the individual is not at home or work (estimate = 0.109, 95% CI = (-0.029, 0.247), p = 0.117), and $\beta_1$ is the difference in the causal excursion effects between when at home or work and when at other locations (estimate = 0.135, 95% CI = (-0.166, 0.435), p = 0.368).
One can also ask summary()
to calculate and print the estimated coefficients for $\beta_0 + \beta_1$, the causal excursion effect when the individual is at home or work, by using the lincomb
optional argument. This is illustrated by the following code. We set lincomb = c(1, 1)
, i.e., asks summary()
to print out $[1, 1] \times (\beta_0, \beta_1)^T = \beta_0 + \beta_1$. The third row in $causal_excursion_effect
, named (Intercept) + is_at_home_or_work
, is the fitted result for this $\beta_0 + \beta_1$ coefficient combination.
summary(fit3, lincomb = c(1, 1))
Based on the output, the causal excursion effect at home or work is estimated to be 0.244, with 95% CI (0.085, 0.403) and p-value 0.002.
The syntax of emee()
is exactly the same as wcls()
. We illustrate the use of emee()
below for completeness.
The following code uses emee()
to estimate the fully marginal causal excursion effect from a synthetic data set data_binary
. This data set contains observations for 100 individuals at 30 different time points. The data set contains the following variables:
userid
: id number of an individual.time
: index of decision point.time_var1
: time-varying covariate 1 for illustration purpose. Here it is defined as the "standardized time in study", defined as the current decision point index divided by the total number of decision points.time_var2
: time-varying covariate 2 for illustration purpose. Here it is the indicator of "the second half of the study", defined as whether the current decision point index is greater than the total number of decision points divided by 2.Y
: the binary proximal outcome.A
: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point;rand_prob
: the randomization probability at each decision point, which is not a constant over time.avail
: whether the individual is available (=1) or not (=0) at the current decision point.A summary of data_binary
is as follows:
head(data_binary, 30)
In the following function call of emee()
, we specify the proximal outcome variable by outcome = Y
. We specify the treatment variable by treatment = A
. We specify the randomization probability by rand_prob = rand_prob
(the first rand_prob
is an argument of emee()
; the second rand_prob
is a column in data_binary
). We specify the fully marginal effect as the quantity to be estimated by setting moderator_formula = ~1
. We use time_var1
and time_var2
as control variables by setting control_formula = ~ time_var1 + time_var2
. We specify the availability variable by availability = avail
.
fit4 <- emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~1, control_formula = ~ time_var1 + time_var2, availability = "avail" ) summary(fit4)
The summary()
function provides the estimated causal excursion effect as well as the 95% confidence interval, standard error, and p-value. The only row in the output $causal_excursion_effect
is named (Intercept)
, indicating that this is the fully marginal effect (like an intercept in the causal effect model). In particular, the estimated marginal excursion effect is 0.341 (on the log relative risk scale), with 95% confidence interval (0.241, 0.44), and p-value$<0.001$.
One can ask summary()
to print out the fitted coefficients for the control variables as well, by setting show_control_fit = TRUE
. This is illustrated by the following code. However, we caution against interpreting the fitted coefficients for the control variables, because these coefficients are only interpretable when the control model is correctly specified, which can rarely be true in an MRT setting.
summary(fit4, show_control_fit = TRUE)
The following code uses emee()
to estimate the causal excursion effect moderated by time_var1
. This is achieved by setting moderator_formula = ~time_var1
.
fit5 <- emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail" ) summary(fit5)
The moderated causal excursion effect is modeled as a linear form: $\beta_0 + \beta_1 \cdot \text{time_var1}$. The output $causal_excursion_effect
contains two rows, one for $\beta_0$ (with row name (Intercept)
) and the other for $\beta_1$ (with row name time_var1
). Here, $\beta_0$ is the causal excursion effect when time_var1
$=0$ (estimate = 0.081, 95% CI = (-0.180, 0.342), p = 0.54), and $\beta_1$ is the slope of time_var1
in the causal excursion effect model (estimate = 0.429, 95% CI = (0.051, 0.808), p = 0.03).
One can also ask summary()
to calculate and print the linear combination of coefficients and their confidence interval, standard error, and p-value, by using the lincomb
optional argument. The following code sets lincomb = rbind(c(1, 0.0333), c(1, 0.5), c(1, 1))
, i.e., asks summary()
to print out the estimates for
$$
\begin{bmatrix}1 & 0.0333\
1 & 0.5\
1 & 1
\end{bmatrix}\times\begin{bmatrix}\beta_{0}\
\beta_{1}
\end{bmatrix}=\begin{bmatrix}\beta_{0}+0.0333\beta_{1}\
\beta_{0}+0.5\beta_{1}\
\beta_{0}+\beta_{1}
\end{bmatrix}.
$$
Because $\beta_1$ is the slope of time_var1
, which is a scaled version of decision time index that starts at 0.0333 and ends at 1, $\beta_0 + 0.0333\beta_1$, $\beta_0 + 0.5\beta_1$ and $\beta_0 + \beta_1$ are the causal excursion effects at the beginning of the study, mid-way during the study, and at the end of the study, respectively. The 3rd to 5th rows in $causal_excursion_effect
show these results. Note that the interpretation is under the assumption that the causal excursion effect changes linearly over time.
summary(fit5, lincomb = rbind(c(1, 0.0333), c(1, 0.5), c(1, 1)))
options(current_options) # restore old options
Below are some references:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.