options(scipen=999)
In this section, we present how to use the tmle3mopttx
package when the data is subject to missigness. Currently we support missigness process on the following nodes:
To start, let's load the packages we'll use and set a seed for simulation:
library(data.table) library(sl3) library(tmle3) library(tmle3mopttx) library(devtools) set.seed(111)
Suppose we observe $n$ i.i.d. observations of $O=(W,A,Y) \sim P_0$. We denote $A$ as treatment, where $A \in {0,1}$ and $Y$ is the final outcome. Note that we treat $W$ as all of our collected baseline covariates. We emphasize that we make no assumptions about the distribution of $P_0$, so that $P_0 \in \mathcal{M}$, where $\mathcal{M}$ is the fully nonparametric model. This is in contrast to much of the current literature that relies on parametric assumptions. We can break the data generating distribution $P_0$ into the following parts:
$$P_0(O) = P_0(Y|A,W)P_0(A|W)P_0(W) = Q_0(Y|A,W)g_0(A|W)Q_{W,0}(W)$$ In addition, we also define $\bar{Q}{Y,0}(A,W) \equiv E_0[Y|A,W]$ such that $E_0(Y_a) = E{0,W}(\bar{Q}_{Y,0}(A=a,W))$.
First, we load the simulated data. Here, our data generating distribution was of the following form:
$$W \sim \mathcal{N}(\bf{0},I_{3 \times 3})$$ $$P(A=1|W) = \frac{1}{1+\exp^{(-0.8*W_1)}}$$
$$P(Y=1|A,W) = 0.5\text{logit}^{-1}[-5I(A=1)(W_1-0.5)+5I(A=0)(W_1-0.5)] + 0.5\text{logit}^{-1}(W_2W_3)$$
data("data_bin")
The above composes our observed data structure $O = (W, A, Y)$.
To formally express this fact using the tlverse
grammar introduced by the tmle3
package,
we create a single data object and specify the functional relationships between
the nodes in the directed acyclic graph (DAG) via nonparametric structural
equation models (NPSEMs), reflected in the node list that we set up:
# organize data and nodes for tmle3 data <- data_bin #Add some random missingless: rr <- sample(nrow(data), 100, replace = FALSE) data[rr,"Y"]<-NA node_list <- list( W = c("W1", "W2", "W3"), A = "A", Y = "Y" )
We now have an observed data structure (data
) and a specification of the role
that each variable in the data set plays as the nodes in a DAG. In addition, we introduced some missigness to our outcome node; in particular, we randomly force $10\%$ of our outcomes to be missing.
sl3
To easily incorporate ensemble machine learning into the estimation procedure,
we rely on the facilities provided in the sl3
R
package. For a complete guide on using the sl3
R
package, consider consulting https://sl3.tlverse.org, or https://tlverse.org for
the tlverse
ecosystem, of which sl3
is a major
part.
Using the framework provided by the sl3
package,
the nuisance parameters of the TML estimator may be fit with ensemble learning,
using the cross-validation framework of the Super Learner algorithm of
@vdl2007super.
# Define sl3 library and metalearners: xgboost_50<-Lrnr_xgboost$new(nrounds = 50) xgboost_100<-Lrnr_xgboost$new(nrounds = 100) xgboost_500<-Lrnr_xgboost$new(nrounds = 500) lrn1 <- Lrnr_mean$new() lrn2<-Lrnr_glm_fast$new() Q_learner <- Lrnr_sl$new( learners = list(xgboost_50,xgboost_100,xgboost_500, lrn1,lrn2), metalearner = Lrnr_nnls$new() ) g_learner <- Lrnr_sl$new( learners = list(xgboost_100,lrn2), metalearner = Lrnr_nnls$new() ) b_learner <- Lrnr_sl$new( learners = list(xgboost_50,xgboost_100,xgboost_500, lrn1,lrn2), metalearner = Lrnr_nnls$new() ) delta_learner <- Lrnr_sl$new( learners = list(xgboost_50,xgboost_100,xgboost_500, lrn1,lrn2), metalearner = Lrnr_nnls$new() )
As seen above, we generate four different ensemble learners that must be fit, corresponding to the learners for the outcome regression, propensity score, blip function, and the missigness process. We make the above explicit with respect to standard notation by bundling the ensemble learners into a list object below:
# specify outcome and treatment regressions and create learner list learner_list <- list(Y = Q_learner, A = g_learner, B = b_learner, delta_Y=delta_learner)
The learner_list
object above specifies the role that each of the ensemble
learners we've generated is to play in computing initial estimators to be used
in building a TMLE for the parameter of interest. In particular, it makes
explicit the fact that our Y
is used in fitting the outcome regression
while our A
is used in fitting our treatment mechanism regression,
B
is used in fitting the blip function, and delta_Y
fits the missing outcome process.
tmle3mopttx
through its tmle3_Spec
To start, we will initialize a specification for the TMLE of our parameter of
interest (called a tmle3_Spec
in the tlverse
nomenclature) simply by calling
tmle3_mopttx_blip_revere
. We specify the argument V = c("W1", "W2", "W3")
when initializing the tmle3_Spec
object in order to communicate that we're interested
in learning a rule dependent on V
covariates. We also need to specify the type of
pseudo-blip we will use in this estimation problem, and finally the list of learners.
Note that for binary treatment, the most natural
type of blip to use is the blip1
, as specified below.
# initialize a tmle specification tmle_spec <- tmle3_mopttx_blip_revere(V = c("W1", "W2", "W3"), type = "blip1", learners = learner_list, maximize = TRUE, complex = TRUE, realistic=FALSE)
As seen above, the tmle3_mopttx_blip_revere
specification object (like all tmle3_Spec
objects) does not store the data for our specific analysis of interest. Later,
we'll see that passing a data object directly to the tmle3
wrapper function,
alongside the instantiated tmle_spec
, will serve to construct a tmle3_Task
object internally (see the tmle3
documentation for details).
In initializing the specification for the TMLE of our parameter of
interest, we have specified the set of covariates the rule depends on ($V$), the type of pseudo-blip to use ("type"), and the learners used for estimating all the relevant parts ($Q$, $g$, blip and missigness). In addition, we need to specify whether we want to maximize the mean outcome under the rule ("maximize=TRUE"), and whether we want to estimate the rule under all the covariates $V$ provided by the user. If FALSE, tmle3mopttx
will instead consider all the possible rules under a smaller set of covariates including the static rules, and optimize the mean outcome over all the subsets of $V$. As such, while the user might have provided a full set of collected covariates as input for $V$, it is possible that the true rule only depends on a subset of the set provided by the user. In that case, our returned mean under the optimal individualized rule will be based on the smaller subset.
One may walk through the step-by-step procedure for fitting the TML estimator
of the mean counterfactual outcome under the optimal ITR, using the
machinery exposed by the tmle3
R package (see
below); however, the step-by-step procedure is often not of interest.
# NOT RUN -- SEE NEXT CODE CHUNK # Define data: tmle_task <- tmle_spec$make_tmle_task(data, node_list) # Define likelihood: initial_likelihood <- tmle_spec$make_initial_likelihood(tmle_task, learner_list) # Define updater and targeted likelihood: updater <- tmle_spec$make_updater() targeted_likelihood <- tmle_spec$make_targeted_likelihood(initial_likelihood, updater) tmle_params <- tmle_spec$make_params(tmle_task, likelihood=targeted_likelihood) updater$tmle_params <- tmle_params fit <- fit_tmle3(tmle_task, targeted_likelihood, tmle_params, updater) fit$summary
Instead, one may invoke the tmle3
convenience function to fit the series of
TML estimators in a single function call:
# fit the TML estimator fit <- tmle3(tmle_spec, data, node_list, learner_list)
Remark: The print
method of the resultant fit
object conveniently
displays the results from computing our TML estimator.
# fit the TML estimator print(fit)
We can also look at the distribution of the optimal interventions estimated:
#Return the distribution of the optimal rule: table(tmle_spec$return_rule)
```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.