options(scipen=999)

Introduction

Suppose one wishes to maximize (or minimize) the population mean of an outcome using a categorical point treatment, where for each individual one has access to measured baseline covariates. Such a treatment strategy is termed individualized treatment regime (ITR), and the (counterfactual) population mean outcome under an ITR is the value of the ITR. An ITR with the maximal (or minimal) value is referred to as an optimal ITR or the optimal rule, whereas the value of an optimal ITR is termed the optimal value. We consider estimation of the mean outcome under the optimal rule, where the candidate rules are restricted to depend only on user-supplied subset of the baseline covariates. The estimation problem is addressed in a statistical model for the data distribution that is nonparametric, and at most places restrictions on the probability of a patient receiving treatment given covariates. Finally, we extend ideas explored by Luedtke et. al to cover ITRs with categorical treatment. For additional background on Targeted Learning and previous work on optimal individualized treatment regimes, please consider consulting @vdl2011targeted, @vdl2018targeted, @vanderLaanLuedtke15 and @luedtke2016super.

To start, let's load the packages we'll use and set a seed for simulation:

library(data.table)
library(sl3)
library(tmle3)
library(tmle3mopttx)
library(devtools)
set.seed(111)

Data and Notation

Suppose we observe $n$ i.i.d. observations of $O=(W,A,Y) \sim P_0$. We denote $A$ as treatment, where $A \in {0,1}$ and $Y$ is the final outcome. Note that we treat $W$ as all of our collected baseline covariates. We emphasize that we make no assumptions about the distribution of $P_0$, so that $P_0 \in \mathcal{M}$, where $\mathcal{M}$ is the fully nonparametric model. This is in contrast to much of the current literature that relies on parametric assumptions. We can break the data generating distribution $P_0$ into the following parts:

$$P_0(O) = P_0(Y|A,W)P_0(A|W)P_0(W) = Q_0(Y|A,W)g_0(A|W)Q_{W,0}(W)$$ In addition, we also define $\bar{Q}{Y,0}(A,W) \equiv E_0[Y|A,W]$ such that $E_0(Y_a) = E{0,W}(\bar{Q}_{Y,0}(A=a,W))$.

Simulated Data

First, we load the simulated data. Here, our data generating distribution was of the following form:

$$W \sim \mathcal{N}(\bf{0},I_{3 \times 3})$$ $$P(A=1|W) = \frac{1}{1+\exp^{(-0.8*W_1)}}$$

$$P(Y=1|A,W) = 0.5\text{logit}^{-1}[-5I(A=1)(W_1-0.5)+5I(A=0)(W_1-0.5)] + 0.5\text{logit}^{-1}(W_2W_3)$$

data("data_bin")

The above composes our observed data structure $O = (W, A, Y)$. Note that the mean under the true optimal rule is $\psi=0.578$.

To formally express this fact using the tlverse grammar introduced by the tmle3 package. We create a single data object and specify the functional relationships between the nodes in the directed acyclic graph (DAG) via nonparametric structural equation models (NPSEMs), reflected in the node list that we set up as:

# organize data and nodes for tmle3
data <- data_bin
node_list <- list(
  W = c("W1", "W2", "W3"),
  A = "A",
  Y = "Y"
)

We now have an observed data structure (data) and a specification of the role that each variable in the data set plays as the nodes in a DAG.


Methodology

Many methods for learning the optimal rule from data have been developed. Here, we focus on the methods developed in @luedtke2016super and @vanderLaanLuedtke15; however tmle3mopttx also supports the widely used Q-learning approach (for a single time point, aka G-comp), based on generating an estimate of $\bar{Q}_{Y,0}(A,W)$ @Sutton1998. We cover how to use the Q-learning approach in the later implementation of the vignette. Here, we focus on the methodology outlined in @luedtke2016super and @vanderLaanLuedtke15, where we learn the optimal ITR using Super Learner @vdl2007super, and estimate its value using the cross-validated Targeted Minimum Loss-based Estimation (CV-TMLE) @cvtmle2010. Luedtke and van der Laan present three different approaches for learning the optimal rule, but tmle3mopttx relies on using the Super Learner to estimate the blip function (or "pseudo-blip" for categorical treatment).

In great generality, we first need to estimate an individual treatment regime which corresponds to dynamic treatment rule ($d(V)$) that takes a subset of covariates $V \in W$ and assigns treatment. As specified in the introduction, we are also interested in the value of such a dynamic rule: $$E_0[Y_{d(V)}] = E_{0,W}[\bar{Q}{Y,0}(A=d(V),W)]$$ which, under causal assumptions, can be interpreted as the mean outcome if (possibly contrary to fact), treatment was assigned according to the rule. The optimal rule is the rule with the maximal value: $$d_0 \equiv \text{argmax}{d \in \mathcal{D}} E_0[Y_{d(V)}], $$ where $\mathcal{D}$ represents the set of possible rules, $d$.

Binary treatment

In the case of a binary treatment, a key quantity for optimal ITR is the blip function. In particular, one can show that any optimal ITR assigns treatment to individuals falling in strata in which the stratum specific average treatment effect, the blip function, is positive and does not assign treatment to individuals for which this quantity is negative. Therefore for a binary treatment, we define a blip function as $$E_0[Y_1-Y_0|V] \equiv E_0[\bar{Q}{Y,0}(1,W) - \bar{Q}{Y,0}(0,W) | V]. $$ The note that the rule can now be derived as $d_0(V) = I(\bar{Q}_0(V) > 0)$.

In particular, we will:

  1. Estimate $\bar{Q}_{Y,0}(A,W)$ and $g_0(A|W)$ using sl3.

  2. Apply the doubly robust A-IPW transform to our outcome, where we define:

$$D_{\bar{Q},g,a}(O) \equiv \frac{I(A=a)}{g(A|W)} (Y-\bar{Q}_Y(A,W)) + \bar{Q}_Y(A=a,W),$$ where

$$E(D_{\bar{Q},g,a}(O) | V) = E(Y^a|V).$$

Using this transform, we can define the following contrast: $D_{\bar{Q},g}(O) = D_{\bar{Q},g,a=1}(O) - D_{\bar{Q},g,a=0}(O).$

We estimate the blip function (\bar{Q}{0,a}(V)) by regressing $D{\bar{Q},g}(O)$ on $V$ using sl3.

  1. Our estimated rule is $d(V) = \text{argmax}{a \in \mathcal{A}} \bar{Q}{0,a}(V)$.

  2. Obtain inference for the mean outcome under the optimal rule using CV-TMLE.

Computational Considerations

We use the estimation approach outlined in @luedtke2016super and @vanderLaanLuedtke15, which makes frequent use of cross-validation for both model selection and CV-TMLE based parameter estimation @cvtmle2010. In order to avoid nested cross-validation, tmle3mopptx relies on Split-Specific Super Learner in order to estimate the rule, as described by Coyle et al [@jeremythesis].

Interlude: Constructing Optimal Stacked Regressions with sl3

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the sl3 R package. For a complete guide on using the sl3 R package, consider consulting https://sl3.tlverse.org, or https://tlverse.org for the tlverse ecosystem, of which sl3 is a major part.

Using the framework provided by the sl3 package, the nuisance parameters of the TML estimator may be fit with ensemble learning, using the cross-validation framework of the Super Learner algorithm of @vdl2007super.

# Define sl3 library and metalearners:
xgboost_50<-Lrnr_xgboost$new(nrounds = 50)
xgboost_100<-Lrnr_xgboost$new(nrounds = 100)
xgboost_500<-Lrnr_xgboost$new(nrounds = 500)
lrn1 <- Lrnr_mean$new()
lrn2<-Lrnr_glm_fast$new()

Q_learner <- Lrnr_sl$new(
  learners = list(xgboost_50,xgboost_100,xgboost_500,
                  lrn1,lrn2),
  metalearner = Lrnr_nnls$new()
)

g_learner <- Lrnr_sl$new(
  learners = list(xgboost_100,lrn2),
  metalearner = Lrnr_nnls$new()
)

b_learner <- Lrnr_sl$new(
  learners = list(xgboost_50,xgboost_100,xgboost_500,
                  lrn1,lrn2),
  metalearner = Lrnr_nnls$new()
)

As seen above, we generate three different ensemble learners that must be fit, corresponding to the learners for the outcome regression, propensity score, and the blip function. We make the above explicit with respect to standard notation by bundling the ensemble learners into a list object below:

# specify outcome and treatment regressions and create learner list
learner_list <- list(Y = Q_learner, A = g_learner, B = b_learner)

The learner_list object above specifies the role that each of the ensemble learners we've generated is to play in computing initial estimators to be used in building a TMLE for the parameter of interest. In particular, it makes explicit the fact that our Y is used in fitting the outcome regression while our A is used in fitting our treatment mechanism regression, and finally B is used in fitting the blip function.

Initializing tmle3mopttx through its tmle3_Spec

To start, we will initialize a specification for the TMLE of our parameter of interest (called a tmle3_Spec in the tlverse nomenclature) simply by calling tmle3_mopttx_blip_revere. We specify the argument V = c("W1", "W2", "W3") when initializing the tmle3_Spec object in order to communicate that we're interested in learning a rule dependent on V covariates. We also need to specify the type of pseudo-blip we will use in this estimation problem, and finally the list of learners. Note that for binary treatment, the most natural type of blip to use is the blip1, as specified below.

# initialize a tmle specification
tmle_spec <- tmle3_mopttx_blip_revere(V = c("W1", "W2", "W3"), type = "blip1", 
                                      learners = learner_list, maximize = TRUE, 
                                      complex = TRUE, realistic=FALSE)

As seen above, the tmle3_mopttx_blip_revere specification object (like all tmle3_Spec objects) does not store the data for our specific analysis of interest. Later, we'll see that passing a data object directly to the tmle3 wrapper function, alongside the instantiated tmle_spec, will serve to construct a tmle3_Task object internally (see the tmle3 documentation for details).

In initializing the specification for the TMLE of our parameter of interest, we have specified the set of covariates the rule depends on ($V$), the type of pseudo-blip to use ("type"), and the learners used for estimating all the relevant parts ($Q$, $g$, blip). In addition, we need to specify whether we want to maximize the mean outcome under the rule ("maximize=TRUE"), and whether we want to estimate the rule under all the covariates $V$ provided by the user. If FALSE, tmle3mopttx will instead consider all the possible rules under a smaller set of covariates including the static rules, and optimize the mean outcome over all the subsets of $V$. As such, while the user might have provided a full set of collected covariates as input for $V$, it is possible that the true rule only depends on a subset of the set provided by the user. In that case, our returned mean under the optimal individualized rule will be based on the smaller subset.

Targeted Estimation of the Mean under the Optimal ITR with Binary Treatment

One may walk through the step-by-step procedure for fitting the TML estimator of the mean counterfactual outcome under the optimal ITR, using the machinery exposed by the tmle3 R package (see below); however, the step-by-step procedure is often not of interest.

# NOT RUN -- SEE NEXT CODE CHUNK

# Define data:
tmle_task <- tmle_spec$make_tmle_task(data, node_list)

# Define likelihood:
initial_likelihood <- tmle_spec$make_initial_likelihood(tmle_task, learner_list)

# Define updater and targeted likelihood:
updater <- tmle_spec$make_updater()
targeted_likelihood <- tmle_spec$make_targeted_likelihood(initial_likelihood, 
                                                            updater)

tmle_params <- tmle_spec$make_params(tmle_task, likelihood=targeted_likelihood)
updater$tmle_params <- tmle_params

fit <- fit_tmle3(tmle_task, targeted_likelihood, tmle_params, 
                   updater)
fit$summary

Instead, one may invoke the tmle3 convenience function to fit the series of TML estimators in a single function call:

# fit the TML estimator
fit <- tmle3(tmle_spec, data, node_list, learner_list)

Remark: The print method of the resultant fit object conveniently displays the results from computing our TML estimator.

# fit the TML estimator
print(fit)

We can also look at the distribution of the optimal interventions estimated:

#Return the distribution of the optimal rule:
table(tmle_spec$return_rule)

Obtaining the Blip Fit and Predictions

In order to estimate our target parameter, we have to estimate the Blip function as well. This might be a particularly useful component to extract if we are interested in predicting the treatment allocation for a new set of covariates. We can also look at the Blip Super Learner fit with the following command:

#Blip SL weights:
tmle_spec$get_blip_fit()

In order to get Blip predictions, we have to initialize the tmle3 task (which usually happens "under the hood" of our Spec). In the following code we use the same data and node list we specified before, but we could have used a different dataset with the same set of covariates necessary for the rule to be defined.

#Get the tmle blip task
tmle_task_blip <- tmle_spec$make_tmle_task(data, node_list)

#Generate predictions
blip_pred <- tmle_spec$get_blip_pred(tmle_task = tmle_task_blip)
head(blip_pred)

Learning the Mean Outcome under the Simplified Optimal Rule

As mentioned in the previous sections, it is possible that the true rule is dependent on less covariates than specified by the user. Similarly, it is possible that a static treatment is actually the optimal intervention for all individuals- there is no heterogeneity among the sampled population. As such, tmle3mopttx will still try to estimate the true rule dependent on the provided V, but with option complex we can force it to consider simpler rules as well. If there is no difference between the mean under a static intervention and a more elaborate rule, the algorithm will opt for a simpler rule, thereby alleviating the possibility that the estimated rule is mostly noise.

# initialize a tmle specification
tmle_spec_simple <- tmle3_mopttx_blip_revere(V = c("W1", "W2", "W3"), type = "blip1", 
                                             learners = learner_list, maximize = TRUE, 
                                             complex = FALSE, realistic = FALSE)

# Define data:
tmle_task <- tmle_spec_simple$make_tmle_task(data, node_list)

# Define likelihood:
initial_likelihood <- tmle_spec_simple$make_initial_likelihood(tmle_task, learner_list)

# Define updater and targeted likelihood:
updater <- tmle_spec_simple$make_updater()
targeted_likelihood <- tmle_spec_simple$make_targeted_likelihood(initial_likelihood, 
                                                            updater)

tmle_params <- tmle_spec_simple$make_params(tmle_task, likelihood=targeted_likelihood)
updater$tmle_params <- tmle_params

fit <- fit_tmle3(tmle_task, targeted_likelihood, tmle_params, 
                   updater)
fit$summary

With the parameter complex set to FALSE, we continue as in the previous example.

# fit the TML estimator
fit <- tmle3(tmle_spec_simple, data, node_list, learner_list)
print(fit)

Learning the Mean Outcome under the Realistic Rule

It is often the case that the estimated rule is not realistic- perhaps giving treatment to a specific subgroup is optimal, however it is rarely (if ever) encountered in the population. Faced with population or practical positivity issues, the user might seek an optimal rule that is supported by the collected sample of data. As such, tmle3mopttx will estimate a data-dependent realistic rule by setting realistic=TRUE.

# initialize a tmle specification
tmle_spec_realistic <- tmle3_mopttx_blip_revere(V = c("W1", "W2", "W3"), type = "blip1", 
                                             learners = learner_list, maximize = TRUE, 
                                             complex = TRUE, realistic = TRUE)
# Define data:
tmle_task <- tmle_spec_realistic$make_tmle_task(data, node_list)

# Define likelihood:
initial_likelihood <- tmle_spec_realistic$make_initial_likelihood(tmle_task, learner_list)

# Define updater and targeted likelihood:
updater <- tmle_spec_realistic$make_updater()
targeted_likelihood <- tmle_spec_realistic$make_targeted_likelihood(initial_likelihood, 
                                                            updater)

tmle_params <- tmle_spec_realistic$make_params(tmle_task, likelihood=targeted_likelihood)
updater$tmle_params <- tmle_params

fit <- fit_tmle3(tmle_task, targeted_likelihood, tmle_params, 
                   updater)
fit$summary

```

References



tlverse/tmle3mopttx documentation built on Aug. 9, 2022, 3:31 p.m.