View source: R/Functions_Rsurrogate.R
R.multiple.surv | R Documentation |
This function calculates the proportion of treatment effect on the primary outcome explained by multiple surrogate markers measured at t_0
and primary outcome information up to t_0
. The user can also request a variance estimate, estimated using perturbating-resampling, and a 95% confidence interval. If a confidence interval is requested three versions are provided: a normal approximation based interval, a quantile based interval and Fieller's confidence interval, all using perturbation-resampling. The user can also request an estimate of the incremental value of the surrogate marker information.
R.multiple.surv(xone, xzero, deltaone, deltazero, sone, szero, type =1, t,
weight.perturb = NULL, landmark, extrapolate = FALSE, transform = FALSE,
conf.int = FALSE, var = FALSE, incremental.value = FALSE, approx = T)
xone |
numeric vector, the observed event times in the treatment group, X = min(T,C) where T is the time of the primary outcome and C is the censoring time. |
xzero |
numeric vector, the observed event times in the control group, X = min(T,C) where T is the time of the primary outcome and C is the censoring time. |
deltaone |
numeric vector, the event indicators for the treatment group, D = I(T<C) where T is the time of the primary outcome and C is the censoring time. |
deltazero |
numeric vector, the event indicators for the control group, D = I(T<C) where T is the time of the primary outcome and C is the censoring time. |
sone |
matrix of numeric values; surrogate marker measurements at |
szero |
matrix of numeric values; surrogate marker measurements at |
type |
type of estimate; options are 1 = two-stage robust estimator, 2 = weighted two-stage robust estimator, 3 = double-robust estimator, 4 = two-stage model-based estimator, 5 = weighted estimator, 6 = double-robust model-based estimator; default is 1. |
t |
the time of interest. |
weight.perturb |
weights used for perturbation resampling. |
landmark |
the landmark time |
extrapolate |
TRUE or FALSE; indicates whether the user wants to use extrapolation. |
transform |
TRUE or FALSE; indicates whether the user wants to use a transformation for the surrogate marker pseudo-score. |
conf.int |
TRUE or FALSE; indicates whether a 95% confidence interval for delta is requested, default is FALSE. |
var |
TRUE or FALSE; indicates whether a variance estimate is requested, default is FALSE. |
incremental.value |
TRUE or FALSE; indicates whether the user would like to see the incremental value of the surrogate marker information, default is FALSE. |
approx |
TRUE or FALSE indicating whether an approximation should be used when calculating the probability of censoring; most relevant in settings where the survival time of interest for the primary outcome is greater than the last observed event but before the last censored case, default is TRUE. |
Let G \in \{A, B\}
be the binary treatment indicator and we assume that subjects are randomly assigned to either treatment group A
or B
at baseline. Let T
denote the time to the occurrence of the primary outcome, death for example, and S = (S_1,S_2,...,S_k)^{T}
denote the vector of k
surrogate markers measured at a given time t_0
. Let T^{(g)}
and S^{(g)}
denote the counterfactual event time and surrogate marker measurements under treatment G = g
for g \in \{A, B\}
. In practice, we only observe (T, S)=(T^{(A)}, S^{(A)})
or (T^{(B)}, S^{(B)})
depending on whether G=A
or B.
The treatment effect, \Delta(t)
, is the treatment difference in survival rates at time t > t_0
,
\Delta(t)=E\{ I(T^{(A)}>t)\} - E\{I(T^{(B)}>t)\} = P(T^{(A)}>t) - P(T^{(B)}>t)
where I(\cdot)
is the indicator function. For individuals who are censored or experience the primary outcome before t_0
, we assume that their S
information is not available.
The surrogate marker information at time t_0
is defined as a combination of the observed information on I(T >t_0)
and the observed S
at t_0
, denoted by Q_{t_0} = \{Q_{t_0}^{(g)}, g = A, B\}
, where
Q_{t_0} ^{(g)} = \{ T ^{(g)} \wedge t_0, S ^{(g)} I(T ^{(g)} > t_0)\}
. With information on Q_{t_0}
, the residual treatment effect is defined as:
\Delta_{S}(t,t_0) = E\{ I(T ^{(A)} > t) - I(T ^{(B)}>t) \mid Q_{t_0}^{(A)} = Q_{t_0}^{(B)} \}
=P(T^{(B)} > t_0) \int_{S} \psi_A(t \mid S, t_0) dF_{B}(S \mid t_0) -P(T^{(B)}> t)
where S = (s_1, ..., s_k)^{T}
,
F_{g}(S \mid t_0) = P(S^{(g)}\le S \mid T^{(g)}>t_0),
\psi_g(t\mid S,t_0) = P(T^{(g)}> t\mid S^{(g)}=S, T^{(g)}> t_0).
The proportion of the treatment effect on the primary outcome that is explained by the treatment effect on Q_{t_0}
is R_{S}(t,t_0)=1-\Delta_{S}(t,t_0)/\Delta
. This function provides 6 different estimators for R_{S}(t,t_0)
using censored data.
Due to censoring, the observed data consist of n
observations \{(G_i, X_{i}, \delta_{i}, S_{i}I(X_{i} > t_0)), i=1,...,n\}
from the two treatment groups, where X_{i} = \min(T_{i}, C_{i})
, \delta_{i} = I(T_{i} < C_{i})
, and C_{i}
denotes the censoring time for the i
th subject. We assume independent censoring i.e., (T_i, S_i) \perp C_i \mid G_i
. For ease of notation, we also let \{(X_{gi}, \delta_{gi}, S_{gi}I(X_{gi} > t_0)), i=1,...,n_g\}
denote n_g = \sum_{i=1}^n I(G_i = g)
observations from treatment group g \in \{A,B\},
where X_{gi}=\min(T_{gi}, C_{gi})
and \delta_{gi}=I(T_{gi}<C_{gi}).
Without loss of generality, we assume that \bar{\pi}_g = n_g/n \to \pi_g \in (0,1)
as n\to \infty
. Throughout, we estimate the treatment effect \Delta(t) =P(T^{(A)}>t) - P(T^{(B)}>t)
as
\hat{\Delta}(t) = n_A^{-1} \sum_{i=1}^{n_A} \frac{I(X_{Ai}>t)}{\hat{W}^C_A(t)} - n_B^{-1} \sum_{i=1}^{n_B} \frac{I(X_{Bi}>t)}{\hat{W}^C_B(t)}
where \hat{W}^C_g(t)
is the Kaplan-Meier estimator of W^C_g(t) = P(C_{g} > t)
.
We first describe the two-stage robust estimator which involves a two-stage procedure combining the use of a working model and a nonparametric estimation procedure for \Delta_{S}(t,t_0)
. The idea is simply to summarize S
into a univariate score U
and then construct a nonparametric estimator for R_S(t,t_0)
treating U
as S
. To construct U
, we approximate the conditional distribution of T^{(A)} \mid S^{(A)}, T^{(A)} > t_0
by using a working semiparametric model such as the landmark proportional hazards model q_A(S) = P(T^{(A)} >t \mid T^{(A)} >t_0, S^{(A)}) = \exp\{-\Lambda_0(t|t_0) \exp(\beta ^{T} S^{(A)}) \}, t>t_0,
where \Lambda_0(t|t_0)
is the unspecified baseline cumulative hazard function for T^{(A)}
conditional on \{T^{(A)} > t_0\}
and \beta
is an unknown vector of coefficients. Let \hat{\beta}
be the maximizer of the corresponding log partial likelihood function and \hat{\Lambda}_0(t|t_0)
be the Breslow-type estimate of baseline hazard. If one were to assume that this working model is correctly specified, then a consistent estimate of \Delta_{S}(t,t_0)
would simply be: \hat{\Delta}_{S}^M=n_B^{-1} \sum_{i=1}^{n_B} [ \exp \{ -\hat{\Lambda}_0(t|t_0)\exp(\hat{\beta} ^{T} S_{Bi}) \} \frac{I(X_{Bi} > t_0)}{\hat{W}^C_B(t_0)} - \frac{I(X_{Bi} > t)}{\hat{W}^C_B(t)}].
We refer to this estimate as the two-stage model-based estimator (option 4 for type).
Instead of relying on correct specification of this model, we use the resulting score U = \beta_0^{T}S
as a univariate “pseudo-marker" to summarize the k
surrogates. In the second stage, to estimate \Delta_{S}(t,t_0)
, we apply a nonparametric approach with S
represented by the univariate marker U
. Specifically, we use a kernel Nelson-Aalen estimator to nonparametrically estimate \phi_A(t|u, t_0)=P(T^{(A)}> t\mid U^{(A)}=u, T^{(A)}> t_0) = \exp\{-\Lambda_A(t|u, t_0 )\}
as \hat \phi_A(t|u, t_0) = \exp\{-\hat{\Lambda}_A(t|u, t_0) \}
, where
\hat{\Lambda}_A(t|u, t_0) = \int_{t_0}^t \frac{\sum_{i=1}^{n_A} I(X_{Ai}>t_0) K_h\{\gamma(\hat{U}_{Ai}) - \gamma(u)\}dN_{Ai}(z)}{\sum_{i=1}^{n_A} I(X_{Ai}>t_0) K_h\{\gamma(\hat{U}_{Ai}) - \gamma(u)\} Y_{Ai}(z)},
\hat{U}_{Ai} = \hat{\beta} ^{T} S_{Ai}
, \hat{U}_{Bi} = \hat{\beta} ^{T} S_{Bi}
, Y_{Ai} = I(X_{Ai} \geq t)
, N_{Ai}(t) = I(X_{Ai} \leq t) \delta_{Ai}, K(\cdot)
is a smooth symmetric density function, K_h(x) = K(x/h)/h,
and \gamma(\cdot)
is a given monotone transformation function. We then estimate \Delta_{S}(t,t_0)
as
\hat{\Delta}_{S}(t,t_0) = n_B^{-1} \sum_{i=1}^{n_B} [\hat{\phi}_A(t|\hat{U}_{Bi},t_0) \frac{I(X_{Bi} > t_0)}{\hat{W}^C_B(t_0)} - \frac{I(X_{Bi} > t)}{\hat{W}^C_B(t)}]
and \hat{R}_{S}(t,t_0) =1- \hat{\Delta}_{S}(t,t_0)/\hat{\Delta}(t).
We refer to this estimate as the two-stage robust estimator (option 1 for type).
The next estimator borrows ideas from the extensive causal inference literature focusing on double robust estimators two-stage weighted estimator with a propensity score weight explicitly balancing the two treatment groups with respect to the distribution of S
. The weighting enables us to “adjust" the distribution of S^{(A)}
before constructing the conditional survival estimate \hat \phi_A(t|u, t_0).
This approach results in a double-robust estimator of \Delta_{S}(t, t_0)
, which is consistent when either U^{(A)}
captures all the information about the relationship between I(T^{(A)}\ge t)
and S^{(A)}
or the propensity score model for \pi(S,t_0)=P(G_i=B|S_i=S, T_{i} > t_0)
is correctly specified. While \pi(S,t_0)
depends on t_0
, for simplicity, we drop t_0
from our notation and simply use \pi(S)
.
Regression models can be imposed to obtain estimates for \pi(S)
. For example, a simple logistic regression model can be imposed for
\tilde{\pi}(S)=P(G_i=B|S_i=S, X_{i}>t_0)
with
logit\{\tilde{\pi}(S)\} = \alpha_0 + \alpha_1 ^{T} S,
where \alpha_0
and \alpha_1
are estimated only among those with X_{gi} > t_0
to account for censoring. The propensity score of interest, \pi(S),
can be derived from \tilde{\pi}(S)
directly since
logit\{\pi(S)\}=logit\{\tilde{\pi}(S)\} + \log\{W_A^C(t_0)/W_B^C(t_0)\},
which follows from the assumption that (T_{gi}, S_{gi}) \perp C_{gi}.
We then modify the above expression by weighting observations with the estimated L(S_{Ai})=\pi(S_{Ai})/\{1-\pi(S_{Ai})\}
and obtain
\hat{\Lambda}^w_A(t|u, t_0) = \int_{t_0}^t \frac{\sum_{i=1}^{n_A} \hat{L}(S_{Ai}) I(X_{Ai}>t_0) K_h\{\gamma(\hat{U}_{Ai}) - \gamma(u)\}dN_{Ai}(z)}{\sum_{i=1}^{n_A} \hat{L}(S_{Ai}) I(X_{Ai}>t_0) K_h\{\gamma(\hat{U}_{Ai}) - \gamma(u)\} Y_{Ai}(z)},
,
where \hat{L}(S_{gi}) = \exp(\hat{\alpha}_0+\hat{\alpha}_1^{T} S_{gi})\hat{W}^C_B(t_0)/\hat{W}^C_A(t_0).
Subsequently, we define
\hat{\Delta}^w_{S}(t,t_0) = n_B^{-1} \sum_{i=1}^{n_B} [\hat{\phi}^w_A(t|\hat{U}_{Bi},t_0) \frac{I(X_{Bi} > t_0)}{\hat{W}^C_B(t_0)} - \frac{I(X_{Bi} > t)}{\hat{W}^C_B(t)}]
and \hat{R}^w_{S}(t,t_0) =1- \hat{\Delta}^w_{S}(t,t_0)/\hat{\Delta}(t)
where \hat \phi^w_A(t|t_0,u) = \exp\{-\hat{\Lambda}^w_A(t|t_0,u) \}.
We refer to this estimate as the weighted two-stage robust estimator (option 2 for type).
While the two-stage weighted estimator reflects one way to enhance the robustness of an initial estimator, the idea of combining a propensity-score type model and a regression-type model has certainly been extensively studied in the causal inference literature and a more familiar double-robust estimator can be constructed as: \hat{\Delta}_{S}^{DR}(t,t_0) = n^{-1} [\sum_{i=1}^{n_A}\frac{I(X_{Ai}>t)}{\hat{W}_{A}^C(t)\bar{\pi}_B} \hat{L}(S_{Ai}) - \sum_{i=1}^{n_B} \frac{I(X_{Bi}>t)}{\hat{W}_{B}^C(t)\bar{\pi}_B} ] - n^{-1} [ \sum_{i=1}^{n_A} \frac{ \hat{\phi}_A(t\mid\hat{U}_{Ai},t_0) I(X_{Ai} > t_0) }{\hat{W}_{A}^C(t_0)\bar{\pi}_B} \hat{L}(S_{Ai}) - \sum_{i=1}^{n_B}\frac{ \hat{\phi}_A(t\mid \hat{U}_{Bi},t_0) I(X_{Bi} > t_0) }{\hat{W}_{B}^C(t_0)\bar{\pi}_B} ]
and \hat{R}_{S}^{DR}(t,t_0) =1- \hat{\Delta}_{S}^{DR}(t,t_0)/\hat{\Delta}(t)
, where \hat{\phi}_A(t|\hat{U}_{gi},t_0)
is the (unweighted) estimate of \phi_A(t\mid u,t_0)
used in \hat{\Delta}_{S}(t,t_0)
. We refer to this estimate as the double robust estimator (option 3 for type).
The weighted estimator (option type 5) is defined as:
\hat{\Delta}^{PS}_{S}(t,t_0) = n^{-1} \sum_{i=1}^n \{\frac{I(X_{i}>t)}{\hat{W}_{G_i}^C(t)\bar{\pi}_B} [ I(G_i = A)\hat{L}(S_{Ai}) - I(G_i = B ) ] \}
and \hat{R}^{PS}_{S}(t,t_0) =1- \hat{\Delta}^{PS}_{S}(t,t_0)/\hat{\Delta}(t).
This estimator completely relies on the correct specification of \pi(S)
. The double-robust model-based estimator (option 6 for type) is defined as \hat{\Delta}_{S}^{DR2}(t,t_0)
and \hat{R}_{S}^{DR2}(t,t_0) =1- \hat{\Delta}_{S}^{DR2}(t,t_0)/\hat{\Delta}(t)
which are constructed parallel to the construction of \hat{\Delta}_{S}^{DR}(t,t_0)
i.e., a combination of \hat{\Delta}_{S}^M(t,t_0)
and \hat{R}^{PS}_{S}(t,t_0)
.
Variance estimates are obtained using perturbation resampling. If a confidence interval is requested three versions are provided: a normal approximation based interval, a quantile based interval and Fieller's confidence interval, all using perturbation-resampling. An estimate of the incremental value of the surrogate marker information can also be requested; this essentially compared the proportion explained by the surrogate information vs. the proportion explained by T
alone up to t_0
. Details can be found in Parast, L., Cai, T., & Tian, L. (2021). Evaluating multiple surrogate markers with censored data. Biometrics, 77(4), 1315-1327.
A list is returned:
delta |
the estimate, |
delta.s |
the residual treatment effect estimate, |
R.s |
the estimated proportion of treatment effect explained by the set of markers, |
delta.var |
the variance estimate of |
delta.s.var |
the variance estimate of |
R.s.var |
the variance estimate of |
conf.int.normal.delta |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.delta |
a vector of size 2; the 95% confidence interval for |
conf.int.normal.delta.s |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.delta.s |
a vector of size 2; the 95% confidence interval for |
conf.int.normal.R.s |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.R.s |
a vector of size 2; the 95% confidence interval for |
conf.int.fieller.R.s |
a vector of size 2; the 95% confidence interval for |
delta.t |
the estimate, |
R.t |
the estimated proportion of treatment effect explained by survival only, |
incremental.value |
the estimate of the incremental value of the surrogate markers, |
delta.t.var |
the variance estimate of |
R.t.var |
the variance estimate of |
incremental.value.var |
the variance estimate of |
conf.int.normal.delta.t |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.delta.t |
a vector of size 2; the 95% confidence interval for |
conf.int.normal.R.t |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.R.t |
a vector of size 2; the 95% confidence interval for |
conf.int.fieller.R.t |
a vector of size 2; the 95% confidence interval for |
conf.int.normal.iv |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.iv |
a vector of size 2; the 95% confidence interval for |
If the treatment effect is not significant, the user will receive the following message: "Warning: it looks like the treatment effect is not significant; may be difficult to interpret the residual treatment effect in this setting".
Layla Parast
Parast, L., Cai, T., & Tian, L. (2017). Evaluating surrogate marker information using censored data. Statistics in Medicine, 36(11), 1767-1782.
Parast, L., Cai, T., & Tian, L. (2021). Evaluating multiple surrogate markers with censored data. Biometrics, 77(4), 1315-1327.
data(d_example_multiple)
names(d_example_multiple)
## Not run:
R.multiple.surv(xone = d_example_multiple$x1, xzero = d_example_multiple$x0, deltaone =
d_example_multiple$delta1, deltazero = d_example_multiple$delta0, sone =
as.matrix(d_example_multiple$s1), szero = as.matrix(d_example_multiple$s0),
type =1, t = 1, landmark=0.5)
R.multiple.surv(xone = d_example_multiple$x1, xzero = d_example_multiple$x0, deltaone =
d_example_multiple$delta1, deltazero = d_example_multiple$delta0, sone =
as.matrix(d_example_multiple$s1), szero = as.matrix(d_example_multiple$s0),
type =1, t = 1, landmark=0.5, conf.int = T)
R.multiple.surv(xone = d_example_multiple$x1, xzero = d_example_multiple$x0, deltaone =
d_example_multiple$delta1, deltazero = d_example_multiple$delta0, sone =
as.matrix(d_example_multiple$s1), szero = as.matrix(d_example_multiple$s0),
type =3, t = 1, landmark=0.5)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.