options(scipen=999)
Stochastic treatment regimes present a relatively simple manner in which to assess the effects of continuous treatments by way of parameters that examine the effects induced by the counterfactual shifting of the observed values of a treatment of interest. Here, we present an implementation of a new algorithm for computing targeted minimum loss-based estimates of treatment shift parameters defined based on a shifting function $d(A,W)$. For a technical presentation of the algorithm, the interested reader is invited to consult @diaz2018stochastic. For additional background on Targeted Learning and previous work on stochastic treatment regimes, please consider consulting @vdl2011targeted, @vdl2018targeted, and @diaz2012population.
To start, let's load the packages we'll use and set a seed for simulation:
library(data.table) library(sl3) library(tmle3) library(tmle3shift) set.seed(429153)
Consider $n$ observed units $O_1, \ldots, O_n$, where each random variable $O = (W, A, Y)$ corresponds to a single observational unit. Let $W$ denote baseline covariates (e.g., age, sex, education level), $A$ an intervention variable of interest (e.g., nutritional supplements), and $Y$ an outcome of interest (e.g., disease status). Though it need not be the case, let $A$ be continuous-valued, i.e. $A \in \mathbb{R}$. Let $O_i \sim \mathcal{P} \in \mathcal{M}$, where $\mathcal{M}$ is the nonparametric statistical model defined as the set of continuous densities on $O$ with respect to some dominating measure. To formalize the definition of stochastic interventions and their corresponding causal effects, we introduce a nonparametric structural equation model (NPSEM), based on @pearl2000causality, to define how the system changes under posited interventions: \begin{align}\label{eqn:npsem} W &= f_W(U_W) \ A &= f_A(W, U_A) \ Y &= f_Y(A, W, U_Y), \end{align} We denote the observed data structure $O = (W, A, Y)$
Letting $A$ denote a continuous-valued treatment, we assume that the
distribution of $A$ conditional on $W = w$ has support in the interval
$(l(w), u(w))$ -- for convenience, let this support be a.e. That is, the
minimum natural value of treatment $A$ for an individual with covariates
$W = w$ is $l(w)$; similarly, the maximum is $u(w)$. Then, a simple stochastic
intervention, based on a shift $\delta$, may be defined
\begin{equation}\label{eqn:shift}
d(a, w) =
\begin{cases}
a - \delta & \text{if } a > l(w) + \delta \
a & \text{if } a \leq l(w) + \delta,
\end{cases}
\end{equation}
where $0 \leq \delta \leq u(w)$ is an arbitrary pre-specified value that
defines the degree to which the observed value $A$ is to be shifted, where
possible. For the purpose of using such a shift in practice, the present
software provides the functions shift_additive
and shift_additive_inv
,
which define a variation of this shift, assuming that the density of treatment
$A$, conditional on the covariates $W$, has support a.e.
# simulate simple data for tmle-shift sketch n_obs <- 1000 # number of observations n_w <- 1 # number of baseline covariates tx_mult <- 2 # multiplier for the effect of W = 1 on the treatment ## baseline covariates -- simple, binary W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5))) ## create treatment based on baseline W A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1)) ## create outcome as a linear function of A, W + white noise Y <- A + W + rnorm(n_obs, mean = 0, sd = 0.5)
The above composes our observed data structure $O = (W, A, Y)$. To formally
express this fact using the tlverse
grammar introduced by the tmle3
package, we create a single data object and
specify the functional relationships between the nodes in the directed acyclic
graph (DAG) via nonparametric structural equation models (NPSEMs), reflected
in the node list that we set up:
# organize data and nodes for tmle3 data <- data.table(W, A, Y) node_list <- list(W = "W", A = "A", Y = "Y") head(data)
We now have an observed data structure (data
) and a specification of the role
that each variable in the data set plays as the nodes in a DAG.
To start, we will initialize a specification for the TMLE of our parameter of
interest (called a tmle3_Spec
in the tlverse
nomenclature) simply by calling
tmle_shift
. We specify the argument shift_val = 0.5
when initializing the
tmle3_Spec
object to communicate that we're interested in a shift of $0.5$ on
the scale of the treatment $A$ -- that is, we specify $\delta = 0.5$ (note that
this is an arbitrarily chosen value for this example).
# initialize a tmle specification tmle_spec <- tmle_shift(shift_val = 0.5, shift_fxn = shift_additive, shift_fxn_inv = shift_additive_inv)
As seen above, the tmle_shift
specification object (like all tmle3_Spec
objects) does not store the data for our specific analysis of interest. Later,
we'll see that passing a data object directly to the tmle3
wrapper function,
alongside the instantiated tmle_spec
, will serve to construct a tmle3_Task
object internally (see the tmle3
documentation for details). Note that, by default,
the tmle_spec
object is set up to facilitate cross-validated estimation of
likelihood components, ensuring certain empirical process conditions may be
circumvented by reducing the contribution of an empirical process term to the
estimated influence function [@zheng2011cross]. In practice, this automatic
incorporation of cross-validation (CV-TMLE) means that the user need not be
concerned with these theoretical conditions being satisfied; moreover,
cross-validated estimation of the efficient influence function is expected to
control the estimated variance.
sl3
To easily incorporate ensemble machine learning into the estimation procedure,
we rely on the facilities provided in the sl3
R
package. For a complete guide on using the sl3
R
package, consider consulting https://tlverse.org/sl3, or https://tlverse.org for
the tlverse
ecosystem, of which sl3
is a core
component.
Using the framework provided by the sl3
package,
the nuisance parameters of the TML estimator may be fit with ensemble learning,
using the cross-validation framework of the Super Learner algorithm of
@vdl2007super. To estimate the treatment mechanism (denoted g, we must make use
of learning algorithms specifically suited to conditional density estimation; a
list of such learners may be extracted from sl3
using sl3_list_learners()
:
sl3_list_learners("density")
To proceed, we'll select two of the above learners, Lrnr_haldensify
for using
the highly adaptive lasso for conditional density estimation, based on an
algorithm given by @diaz2011super, and Lrnr_density_semiparametric
, an
approach for semiparametric conditional density estimation:
# learners used for conditional density regression (i.e., propensity score) haldensify_lrnr <- Lrnr_haldensify$new( n_bins = 3, grid_type = "equal_mass", lambda_seq = exp(seq(-1, -9, length = 100)) ) hse_lrnr <- Lrnr_density_semiparametric$new(mean_learner = Lrnr_glm$new()) mvd_lrnr <- Lrnr_density_semiparametric$new(mean_learner = Lrnr_glm$new(), var_learner = Lrnr_mean$new()) sl_lrn_dens <- Lrnr_sl$new( learners = list(haldensify_lrnr, hse_lrnr, mvd_lrnr), metalearner = Lrnr_solnp_density$new() )
We also require an approach for estimating the outcome regression (denoted Q). For this, we build a Super Learner composed of an intercept model, a main terms GLM, and the xgboost algorithm for gradient boosting:
# learners used for conditional expectation regression (e.g., outcome) mean_lrnr <- Lrnr_mean$new() glm_lrnr <- Lrnr_glm$new() xgb_lrnr <- Lrnr_xgboost$new() sl_lrn <- Lrnr_sl$new( learners = list(mean_lrnr, glm_lrnr, xgb_lrnr), metalearner = Lrnr_nnls$new() )
We can make the above explicit with respect to standard notation by bundling
the ensemble learners into a list
object below.
# specify outcome and treatment regressions and create learner list Q_learner <- sl_lrn g_learner <- sl_lrn_dens learner_list <- list(Y = Q_learner, A = g_learner)
The learner_list
object above specifies the role that each of the ensemble
learners we've generated is to play in computing initial estimators to be used
in building a TMLE for the parameter of interest here. In particular, it makes
explicit the fact that our Q_learner
is used in fitting the outcome regression
while our g_learner
is used in fitting our treatment mechanism regression.
Note that, by default, the
tmle_fit <- tmle3(tmle_spec, data, node_list, learner_list) tmle_fit
The print
method of the resultant tmle_fit
object conveniently displays the
results from computing our TML estimator.
Recall that the asymptotic distribution of TML estimators has been studied thoroughly: $$\psi_n - \psi_0 = (P_n - P_0) \cdot D(\bar{Q}_n^, g_n) + R(\hat{P}^, P_0),$$ which, provided the following two conditions:
Under the additional condition that the remainder term $R(\hat{P}^*, P_0)$ decays as $o_P \left( \frac{1}{\sqrt{n}} \right),$ we have that $$\psi_n - \psi_0 = (P_n - P_0) \cdot D(P_0) + o_P \left( \frac{1}{\sqrt{n}} \right),$$ which, by a central limit theorem, establishes a Gaussian limiting distribution for the estimator:
$$\sqrt{n}(\psi_n - \psi) \to N(0, V(D(P_0))),$$ where $V(D(P_0))$ is the variance of the efficient influence curve (canonical gradient) when $\psi$ admits an asymptotically linear representation.
The above implies that $\psi_n$ is a $\sqrt{n}$-consistent estimator of $\psi$, that it is asymptotically normal (as given above), and that it is locally efficient. This allows us to build Wald-type confidence intervals in a straightforward manner:
$$\psi_n \pm z_{\alpha} \cdot \frac{\sigma_n}{\sqrt{n}},$$ where $\sigma_n^2$ is an estimator of $V(D(P_0))$. The estimator $\sigma_n^2$ may be obtained using the bootstrap or computed directly via the following
$$\sigma_n^2 = \frac{1}{n} \sum_{i = 1}^{n} D^2(\bar{Q}_n^{\star}, g_n)(O_i)$$
Having now re-examined these facts, let's simply examine the results of computing our TML estimator:
tmle_fit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.