Right censoring/monotone coarsening
In polle: Policy Learning

library(polle)
library(data.table)

Target parameter under discrete time right-censoring/monotone coarsening

For simplicity we consider a two-stage case with a binary action set ${0,1}$ at each stage and a final outcome $U$.

A full observation without right censoring is given by the temporally ordered variable $O$:

\begin{align} O^F = {B, X_1, A_1, X_2, A_2, X_3, U_3}, \end{align}

where $B$ denotes the baseline variables, $X_1, X_2, X_3$ denote the stage specific variables, $U$ denotes the outcome/utility, and $A_1, A_2$ denote the treatment actions.

Now, introduce non-missingness indicators $\Delta_1, \Delta_2, \Delta_3$. Under right-censoring, an observation is given by

\begin{align} O = {B, X_1, \Delta_1, \Delta_1 A_1, \Delta_1 X_2, \Delta_1 \Delta_2, \Delta_1 \Delta_2 A_2, \Delta_1 \Delta_2 X_3, \Delta_1 \Delta_2\Delta_3, \Delta_1 \Delta_2 \Delta_3 U}. \end{align}

For convenience, $H_1 = {B, X_1}$, $H_2 = {B, X_1, A_1, X_2}$, and $H_3 = {B, X_1, A_1, X_2, A_2, X_3}$.

The full data target parameter is still

\begin{aligned} \mathbb{E}\left[ \left(\prod_{k=1}^2 \frac{I{A_k = d_k(H_k} }{g_{0,k}(H_k, A_k)} \right) U \right], \end{aligned} We assume the following (strong) sequential randomization for the right-censoring process:

\begin{aligned} &\Delta_3 \perp U | H_3, \Delta_1 = 1, \Delta_2 = 1\ &\Delta_2 \perp {A_2, X_3, \Delta_3, U}| H_2, \Delta_1 = 1 \ &\Delta_1 \perp {A_1, X_2, \Delta_2, A_2, X_3, \Delta_3, U} | H_1. \end{aligned}

Under these assumptions it can be shown that our observed data target parameter is equal to the full data target parameter:

\begin{aligned} &\mathbb{E}\left[ \left(\prod_{k=1}^3 \frac{\Delta_k}{C_{0,k}(H_k)} \right) \left(\prod_{k=1}^2 \frac{I{A_k = d_k(H_k} }{g_{0,k}(H_k, A_k)} \right) U \right]\ =&\mathbb{E}\left[ \left(\prod_{k=1}^2 \frac{I{A_k = d_k(H_k} }{g_{0,k}(H_k, A_k)} \right) U \right], \end{aligned} where \begin{aligned} C_{0,k}(h_k) = \mathbb{P}(\Delta_k = 1| H_k = h_k, \overline{\Delta}_k = 1) \end{aligned}

and

\begin{aligned} g_{0,1}(H_1, a_1) &= \mathbb{P}(A_1 = a_1|H_1, \Delta_1 = 1) = \mathbb{P}(A_1 = a_1|H_1)\ g_{0,2}(H_2, a_2) &= \mathbb{P}(A_2 = a_2|H_2, \Delta_1 = 1, \Delta_2 = 1) = \mathbb{P}(A_2 = a_2|H_2) \end{aligned}

Note the observed target parameter is also identified via the g-computation formula \begin{aligned} \mathbb{E}[Q^d_{0,1}(H_1, d_1(H_1))], \end{aligned} where \begin{aligned} Q^d_{0,1}(h_1, d_1(h_1)) &= \mathbb{E}[Q^d_{0,2}(H_2, d_2(H_2))|H_1=h_1, \Delta_1 = 1, A_1 = d(h_1)]\ Q^d_{0,2}(h_2, d_2(h_2)) &= \mathbb{E}[Q^d_{0,3}(H_3)|H_2 = h_2, \Delta_1 = 1, \Delta_2 = 1, A_2 = d(h_2)]\ Q^d_{0,3}(h_3) &= \mathbb{E}[U|H_3 = h_3, \Delta_1 = 1, \Delta_2 = 2, \Delta_3 = 1]. \end{aligned} Note that the observed target parameter can be seen as a 5-step inverse probability weighting expression, and that the associated g-computation formula can be reduced to 3 steps. This would not be the case if additional information was collected between $\Delta_k$ and $A_k$.

Efficient estimation

For the same reason the (un-centralized) efficient/robust score reduces to

\begin{aligned} Z_1(d,& g, Q^d, C)(O) \ &= \,Q_{1}^{d}(H_1 , d_1(H_1)) \ &+ \sum_{r = 1}^3 \left{ \prod_{j = 1}^{r} \frac{\Delta_j}{C_j(H_j)} \frac{I{A_j = d_j(H_{j})}}{g_{j}(H_j, A_j)} \right}\left{Q_{r+1}^{d}(H_{r+1} , d_{r+1}(H_{r+1})) - Q_{r}^{d}(H_r , d_r(H_r))\right}, \end{aligned} where we set \begin{aligned} \frac{I{A_3 = d_3(H_{3})}}{g_{3}(H_3, A_3)} = 1, \qquad Q_{4}^{d}(H_{4} , d_{4}(H_{4})) = U. \end{aligned}

The above expression for the score by direct calculation of the efficient influence function for the observed target parameter or by utilizing a monotone coarsening argument, @tsiatis2006semiparametric or @laan2003unified.

A one-step estimator based on this score is implmented in polle via policy_eval().

Adaption to continuous time right-censoring

Assume that $A_1$, $A_2$, $U$ occur at time points $T_1 < T_2 < T_3$. Let $C$ denote an underlying censoring time point. Then we observe time points $\tilde T_k = \min(T_k, C)$ and non-misising indicators $\Delta_k = {T_k \leq C}$. Note that with this temporal structure $\prod_{j = 1}^k \Delta_j = \Delta_k$. For notational convenience we include $\tilde T_k$ in the state variable $X_k$.

Now, let $C_k$ denote a survival model for the censoring process:

\begin{aligned} C_{0,k}(\tilde T_k|H_k) = \mathbb{P}(C > T_k|H_k, C \geq \tilde T_{k-1}) \end{aligned}

\begin{aligned} Z_1(d,& g, Q^d, C)(O) \ &= \,Q_{1}^{d}(H_1 , d_1(H_1)) \ &+ \sum_{r = 1}^3 \left{ \prod_{j = 1}^{r} \frac{\Delta_j}{C_j(\tilde T_j|H_j)} \frac{I{A_j = d_j(H_{j})}}{g_{j}(H_j, A_j)} \right}\left{Q_{r+1}^{d}(H_{r+1} , d_{r+1}(H_{r+1})) - Q_{r}^{d}(H_r , d_r(H_r))\right}. \end{aligned}

We note that this score adapted to handle continuous right-censoring is not the efficient score which incorporate the induced temporal structure. A study of the second order remainder terms yields that for correctly specified nuisance models for $g_0, Q^d_0, C_0$, under reasonable convergence rate assumptions, the associated one-step estimator has asymptotic variance as induced by the influence function $Z_1(d,g_0, Q^d_0, C_0)(O) - \mathbb{E}[Z_1(d,g_0, Q^d_0, C_0)(O)]$.

Terminal events

Terminal events may occur, resulting in a stochastic number of stages. However, auxiliary data can be constructed, and the methodology for a fixed-stage problem can be applied, taking into account the degenerate distribution of the new data.

Let $\epsilon_1$ and $\epsilon_2$ be terminal-event indicators included in the state covariates $X_1$ and $X_22$. Recall that $X_1$ and $X_2$ also include utility contributions $U_1$ and $U_2$, and that the complete utility is $U = U_1 + U_2 + U_3$.

If a terminal event occurs at stage 2 we observe

\begin{align} O^\ast = {B, X_1, \Delta_1, \Delta_1 A_1, \Delta_1 X_2, \Delta_1 U_2}. \end{align}

We then construct auxiliary data

\begin{align} {\Delta_2 = 1, A_2 = a^\dagger, X_3 = \emptyset, \Delta_3 = 0, U_3 = 0} \end{align}

and calculate the (un-centralized) doubly robust score using that if a terminal event happens at stage 2, i.e., that $\epsilon_2 = 1$:

\begin{align} &C_2(H_2, \epsilon_2 = 1) = C_3(H_3, \epsilon_2 = 1) = 1\ &g_2(H_2, \epsilon_2 = 1) = 1\ &Q_2^d(H_2, \epsilon_2 = 1, a^\dagger) = Q_3^d(H_3, \epsilon_2 = 1) = Q_4^d(H_4, \epsilon_2 = 1) = U_1 + U_2 = U \end{align}

Examples

sim_single_stage_right_cens <- function(n = 2e3, zeta = c(0.7, 0.2), type = "right"){

  d <- sim_single_stage(n = n)
  pd <- policy_data(data = d,
                    action = "A",
                    covariates = c("Z", "L", "B"),
                    utility = "U")

  ld <- pd$stage_data

  ld[stage == 1, time := 1]
  ld[stage == 2, time := 2]

  ld[stage == 2, Z := d$Z]
  ld[stage == 2, L := d$L]
  ld[stage == 2, B := d$B]

  ## simulating the right censoring time
  ## only depending on the baseline covariate Z:
  C <- c(rexp(n, 1) / exp((-1) * cbind(1, as.numeric(d$Z)) %*% zeta))

  ld[stage == 1, time_c := C]
  ld[stage == 2, time_c := C]

  ld[, delta := time_c >= time]

  ld[delta == FALSE , event := 2]
  ld[delta == FALSE, A := NA]
  ld[delta == FALSE & stage == 2, U := NA]
  ld[delta == FALSE, U_A0 := 0]
  ld[delta == FALSE, U_A1 := 0]

  ld[ , tmp := shift(delta, fill = TRUE), by = list(id)]
  ld <- ld[tmp == TRUE, ]
  ld[ , time := pmin(time, time_c)]
  ld[ , time_c := NULL]
  ld[ , tmp := NULL]
  ld[ , delta := NULL]

  if (type == "interval"){
    ld[, time2 := time]
    ld[, time := shift(time, fill = 0), by = list(id)]
  }

  return(ld)
}

set.seed(1)
ld <- sim_single_stage_right_cens(n = 5e2, type = "interval")
pd <- policy_data(data = ld, type = "long", action = "A", time = "time", time2 = "time2")

pe <- policy_eval(
  policy_data = pd,
  policy = policy_def(1),
  m_model = q_glm(~ .),
  m_full_history = TRUE,
  c_models = list(c_cox(formula = ~ Z_1),
                  c_cox(formula = ~ Z_2*A_1)),
  c_full_history = TRUE
)
pe

pe <- policy_eval(
  policy_data = pd,
  policy_learn = policy_learn(type = "blip", control = control_blip()),
  m_model = q_glm(~.),
  m_full_history = FALSE,
  c_models = c_cox(formula = ~ Z)
)
pe