View source: R/jointSurroPenal.R
jointSurroPenal | R Documentation |
Joint Frailty Surrogate model definition
Fit the one-step Joint surrogate model for the evaluation of a canditate surrogate endpoint,
with different integration methods on the random effects, using a semiparametric penalized
likelihood estimation. This approach extends that of Burzykowski et al.
(2001) by
including in the same joint frailty model the individual-level and the trial-level random effects.
This function can also be used for mediation analysis where a direct effect of the surrogate time S
on the final endpoint T
is allowed through a function g(S)
.
For the jth subject (j=1,...,ni) of the ith trial (i=1,...,G), the joint surrogate model is defined as follows:
where,
\omega
ij ~ N
(0,\theta
), ui ~ N
(0,\gamma
), \omega
i ⊥ ui,
ui ⊥ vSi, ui ⊥ vTi
and
(vSi,vTi)T ~ N
(0,\Sigma
v)
with
In this model, \lambda
0s(t) is the baseline hazard function associated with the
surrogate endpoint and \beta
S the fixed treatment effect (or log-hazard ratio);
\lambda
0T(t) is the baseline hazard function associated with the true endpoint
and \beta
T the fixed treatment effect. \omega
ij is a shared individual-level frailty that serve to take into account the
heterogeneity in the data at the individual level; ui is a shared frailty effect associated
with the baseline hazard function that serve to take into account the heterogeneity between trials
of the baseline hazard function, associated with the fact that we have several trials in this
meta-analytical design. The power parameters \zeta
and \alpha
distinguish
both individual and trial-level heterogeneities between the surrogate and the true endpoint.
vSi and vTi are two correlated random effects treatment-by-trial interactions.
Z
ij1 represents the treatment arm to which the patient has been randomized.
In the mediation analysis setting, the hazard function for the true endpoint
becomes:
where the term I(Sij≤ t)g(Sij) allows
for a direct effect of the surrogate time S
on the risk
of occurrence of the final endpoint T
.
Surrogacy evaluation
We proposed new definitions of Kendall's \tau
and coefficient of determination as
individual-level and trial-level association measurements, to evaluate a candidate
surrogate endpoint (Sofeu et al., 2018). For the surrogacy in the mediation analysis setting
see the "Surrogacy through mediation" Section.
Individual-level surrogacy
To measure the strength of association between S
ij and T
ij after
adjusting the marginal distributions for the trial and the treatment effects, as show in
Sofeu et al.(2018), we use the Kendall's \tau
define by :
where \theta
, \zeta
, \alpha
and \gamma
are estimated using the joint surrogate model
defined previously. Kendall's \tau
is the difference between the probability of
concordance and the probability of discordance of two realizations of S
ij and T
ij.
It belongs to the interval [-1,1] and assumes a zero value when S
ij and T
ij are
independent. We estimate Kendall's \tau
using Monte-Carlo or Gaussian Hermite
quadrature integration methods. Its confidence interval is estimated using parametric
bootstrap
Trial-level surrogacy
The key motivation for validating a surrogate endpoint is to be able to predict the effect
of treatment on the true endpoint, based on the observed effect of treatment on the
surrogate endpoint. As shown by Buyse et al. (2000), the coefficenient of
determination obtains from the covariance matrix \Sigma
v of the random effects
treatment-by-trial interaction can be used to evaluate underlined prediction, and
therefore as surrogacy evaluation measurement at trial-level. It is defined by:
The SEs of R
trial2 is calculated using the Delta-method. We also propose
R
trial2 and 95% CI computed using the parametric bootstrap. The use of delta-method
can lead to confidence limits violating the [0,1], as noted by
(Burzykowski et al., 2001). However, using other methods would not significantly alter
the findings of the surrogacy assessment
Surrogacy through mediation
In the mediation analysis setting, the surrogacy measure is the proportion of treatment effect
on the final endpoint T
that goes through its effect on the surrogate S
.
This measure is a time-dependent function R(t)
defined as:
where NIE
and TE
stand for "natual indirect effect" and "total effect"
respectively. The numerator is the difference of the survival function of T
for a subject whose treatment has been set to 1
(experimental arm) for
both S
and T
versus a subject for which the treatment for T
is still 1
but is set 0
for S
. This corresponds
to the indirect effect (in ther of survival probability) of the treatment on T
through S
. The denominator is the total effect of the treatment on T
.
jointSurroPenal(data, maxit=50, indicator.zeta = 1,
indicator.alpha = 1, frail.base = 1, n.knots = 6,
LIMparam = 0.001, LIMlogl = 0.001, LIMderiv = 0.001,
nb.mc = 300, nb.gh = 32, nb.gh2 = 20, adaptatif = 0,
int.method = 2, nb.iterPGH = 5, nb.MC.kendall = 10000,
nboot.kendall = 1000, true.init.val = 0,
theta.init = 1, sigma.ss.init = 0.5, sigma.tt.init = 0.5,
sigma.st.init = 0.48, gamma.init = 0.5, alpha.init = 1,
zeta.init = 1, betas.init = 0.5, betat.init = 0.5, scale = 1,
random.generator = 1, kappa.use = 4, random = 0,
random.nb.sim = 0, seed = 0, init.kappa = NULL, ckappa = c(0,0),
nb.decimal = 4, print.times = TRUE, print.iter=FALSE,mediation=FALSE,
g.nknots=1,rt.times=NULL,rt.ntimes=NULL,rt.nmc=500,rt.boot=FALSE,
rt.nboot=2000,rt.boot.nmc=500,rt.integ.type=2)
data |
A
|
maxit |
maximum number of iterations for the Marquardt algorithm.
The default being |
indicator.zeta |
A binary, indicates whether the power's parameter |
indicator.alpha |
A binary, indicating whether the power's parameter |
frail.base |
A binary, indicating whether the heterogeneity between trial on the baseline risk
is considered ( |
n.knots |
integer giving the number of knots to use. Value required in
the penalized likelihood estimation. It corresponds to the (n.knots+2)
splines functions for the approximation of the hazard or the survival
functions. We estimate I or M-splines of order 4. When the user set a
number of knots equals to k (n.knots=k) then the number of interior knots
is (k-2) and the number of splines is (k-2)+order. Number of knots must be
between 4 and 20. (See |
LIMparam |
Convergence threshold of the Marquardt algorithm for the
parameters, 10-3 by default (See |
LIMlogl |
Convergence threshold of the Marquardt algorithm for the
log-likelihood, 10-3 by default (See |
LIMderiv |
Convergence threshold of the Marquardt algorithm for the gradient, 10-3 by default
(See |
nb.mc |
Number of samples considered in the Monte-Carlo integration. Required in the event
|
nb.gh |
Number of nodes for the Gaussian-Hermite quadrature. It can be chosen among 5, 7, 9, 12, 15, 20 and 32. The default is 32. |
nb.gh2 |
Number of nodes for the Gauss-Hermite quadrature used to re-estimate the model,
in the event of non-convergence, defined as previously. The default is |
adaptatif |
A binary, indicates whether the pseudo adaptive Gaussian-Hermite quadrature |
int.method |
A numeric, indicates the integration method: |
nb.iterPGH |
Number of iterations before the re-estimation of the posterior random effects,
in the event of the two-steps pseudo-adaptive Gaussian-hermite quadrature. If set to |
nb.MC.kendall |
Number of generated points used with the Monte-Carlo to estimate
integrals in the Kendall's |
nboot.kendall |
Number of samples considered in the parametric bootstrap to estimate the confidence
interval of the Kendall's |
true.init.val |
Numerical value. Indicates if the given initial values to parameters |
theta.init |
Initial values for |
sigma.ss.init |
Initial values for
|
sigma.tt.init |
Initial values for
|
sigma.st.init |
Initial values for
|
gamma.init |
Initial values for |
alpha.init |
Initial values for |
zeta.init |
Initial values for |
betas.init |
Initial values for |
betat.init |
Initial values for |
scale |
A numeric that allows to rescale (multiplication) the survival times, to avoid numerical
problems in the event of some convergence issues. If no change is needed the argument is set to 1, the default value.
eg: |
random.generator |
Random number generator used by the Fortran compiler,
|
kappa.use |
A numeric, that indicates how to manage the smoothing parameters k1
and k2 in the event of convergence issues. If it is set to |
random |
A binary that says if we reset the random number generation with a different environment
at each call |
random.nb.sim |
If |
seed |
The seed to use for data (or samples) generation. required if |
init.kappa |
smoothing parameter used to penalized the log-likelihood. By default (init.kappa = NULL) the values used are obtain by cross-validation. |
ckappa |
Vector of two fixed values to add to the smoothing parameters. By default it is set to (0,0). this argument allows to well manage the smoothing parameters in the event of convergence issues. |
nb.decimal |
Number of decimal required for results presentation. |
print.times |
a logical parameter to print estimation time. Default is TRUE. |
print.iter |
a logical parameter to print iteration process. Default is FALSE. |
mediation |
a logical value indicating if the mediation analysis method is used. Default is FALSE. |
g.nknots |
In the case of a mediation analysis, indicates how many inner knots
are used in the splines basis for estimating the function |
rt.times |
In the mediation analysis setting, a vector of times for which the
funtion |
rt.ntimes |
In the mediation setting, if the argument |
rt.nmc |
An integer indicating how many Monte Carlo simulations are used
to integrate over the random effects in the computation of the function |
rt.boot |
A logical value indicating if bootstrapped confidence bands needs to be computed for the
function |
rt.nboot |
An integer indicating how many bootstrapped replicates of R(t) needs to be computed to derive confidence bands for R(t). Default is 2000. |
rt.boot.nmc |
If |
rt.integ.type |
An integer indicating which type of integration over the distribution of
|
The estimated parameter are obtained using the robust Marquardt algorithm (Marquardt, 1963) which is a combination between a Newton-Raphson algorithm and a steepest descent algorithm. The iterations are stopped when the difference between two consecutive log-likelihoods was small (< 10-3 ), the estimated coefficients were stable (consecutive values (< 10-3 )), and the gradient small enough (< 10-3 ), by default. Cubic M-splines of order 4 are used for the hazard function, and I-splines (integrated M-splines) are used for the cumulative hazard function.
The inverse of the Hessian matrix is the variance estimator and to deal
with the positivity constraint of the variance component and the spline
coefficients, a squared transformation is used and the standard errors are
computed by the \Delta
-method (Knight & Xekalaki, 2000). The smooth
parameter can be chosen by maximizing a likelihood cross validation
criterion (Joly and other, 1998).
We proposed based on the joint surrogate model a new definition
of the Kendall's \tau
. Moreover, distinct numerical integration methods are available to approximate the
integrals in the marginal log-likelihood.
Non-convergence case management procedure
Special attention must be given to initializing model parameters, the choice of the number of
spline knots, the smoothing parameters and the number of quadrature points to solve convergence
issues. We first initialized parameters using the user's desired strategy, as specified
by the option true.init.val
. When numerical or convergence problems are encountered,
with kappa.use
set to 4
, the model is fitted again using a combination of the following strategies:
vary the number of quadrature point (nb.gh
to nb.gh2
or nb.gh2
to nb.gh
)
in the event of the use of the Gaussian Hermite quadrature integration (see int.method
);
divided or multiplied the smoothing parameters ( k1,
k2) by 10 or 100 according to
their preceding values, or used parameter vectors obtained during the last iteration (with a
modification of the number of quadrature points and smoothing parameters). Using this strategy,
we usually obtained during simulation the rejection rate less than 3%. A sensitivity analysis
was conducted without this strategy, and similar results were obtained on the converged samples,
with about a 23% rejection rate.
This function return an object of class jointSurroPenal or jointSurroMed in the mediation analysis setting with elements:
EPS |
A vector containing the obtained convergence thresholds with the Marquardt algorithm, for the parameters, the log-likelihood and for the gradient; |
b |
A vector containing estimates for the splines parameter's;
the power's parameter |
varH |
The variance matrix of all parameters in |
varHIH |
The robust estimation of the variance matrix of all parameters in |
loglikPenal |
The complete marginal penalized log-likelihood; |
LCV |
the approximated likelihood cross-validation criterion in the semiparametric case (with |
xS |
vector of times for surrogate endpoint where both survirt.nmc.bootval and hazard function are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times; |
lamS |
array (dim = 3) of hazard estimates and confidence bands, for surrogate endpoint; |
survS |
array (dim = 3) of baseline survival estimates and confidence bands, for surrogate endpoint; |
xT |
vector of times for true endpoint where both survival and hazard function are estimated. By default seq(0, max(time), length = 99), where time is the vector of survival times; |
lamT |
array (dim = 3) of hazard estimates and confidence bands, for true endpoint; |
survT |
array (dim = 3) of baseline survival estimates and confidence bands, for true endpoint; |
n.iter |
number of iterations needed to converge; |
theta |
Estimate for |
gamma |
Estimate for |
alpha |
Estimate for |
zeta |
Estimate for |
sigma.s |
Estimate for |
sigma.t |
Estimate for |
sigma.st |
Estimate for |
beta.s |
Estimate for |
beta.t |
Estimate for |
ui |
A binary, that indicates if the heterogeneity between trial on the baseline risk
has been Considered ( |
ktau |
The Kendall's |
R2.boot |
The
|
Coefficients |
The estimates with the corresponding standard errors and the 95 |
kappa |
Positive smoothing parameters used for convergence. These values could be different to initial
values if |
scale |
The value used to rescale the survival times |
data |
The dataset used in the model |
varcov.Sigma |
covariance matrix of
the estimates of ( |
parameter |
list of all arguments used in the model |
mediation |
List returned in the case where the option
|
Casimir Ledoux Sofeu casimir.sofeu@u-bordeaux.fr, scl.ledoux@gmail.com, Quentin Le Coent quentin.le-coent@u-bordeaux.fr and Virginie Rondeau virginie.rondeau@inserm.fr,
Burzykowski, T., Molenberghs, G., Buyse, M., Geys, H., and Renard, D. (2001). Validation of surrogate end points in multiple randomized clinical trials with failure time end points. Journal of the Royal Statistical Society: Series C (Applied Statistics) 50, 405-422.
Buyse, M., Molenberghs, G., Burzykowski, T., Renard, D., and Geys, H. (2000). The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 1, 49-67
Sofeu, C. L., Emura, T., and Rondeau, V. (2019). One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine 38, 2928-2942.
Le Coent, Q., Legrand, C., Rondeau, V. (2021). Time-to-event surrogate endpoint validation using mediation and meta-analytic data. Article submitted.
jointSurrSimul
, summary.jointSurroPenal
, jointSurroPenalSimul
## Not run:
# Generation of data to use
data.sim <- jointSurrSimul(n.obs=600, n.trial = 30,cens.adm=549.24,
alpha = 1.5, theta = 3.5, gamma = 2.5, zeta = 1, sigma.s = 0.7,
sigma.t = 0.7, cor = 0.8, betas = -1.25, betat = -1.25,
full.data = 0, random.generator = 1, seed = 0, nb.reject.data = 0)
#Surrogacy evaluation based on generated data with a combination of Monte Carlo
#and classical Gaussian Hermite integration.*
# (Computation takes around 5 minutes)
joint.surro.sim.MCGH <- jointSurroPenal(data = data.sim, int.method = 2,
nb.mc = 300, nb.gh = 20)
#Surrogacy evaluation based on generated data with a combination of Monte Carlo
# and Pseudo-adaptive Gaussian Hermite integration.
# (Computation takes around 4 minutes)
joint.surro.sim.MCPGH <- jointSurroPenal(data = data.sim, int.method = 2,
nb.mc = 300, nb.gh = 20, adaptatif = 1)
# Results
summary(joint.surro.sim.MCGH)
summary(joint.surro.sim.MCPGH)
# Data from the advanced ovarian cancer randomized clinical trials.
# Joint surrogate model with \eqn{\zeta} fixed to 1, 8 nodes spline
# and the rescaled survival time.
data(dataOvarian)
# (Computation takes around 20 minutes)
joint.surro.ovar <- jointSurroPenal(data = dataOvarian, n.knots = 8,
init.kappa = c(2000,1000), indicator.alpha = 0, nb.mc = 200,
scale = 1/365)
# results
summary(joint.surro.ovar)
print(joint.surro.ovar)
# Mediation analysis on the adjuvant chemotherapy
# dataset where the surrogate is a time-to-relapse and the final endpoint is death.
# 4 knots are used to estimate the two baseline hazard functions.
# The function g(s) is estimated using cubic b-splines with 1 interior
# knot ('g.nkots=1'). The function R(t) is computed at 100 time points
# using 10.000 Monte Carlo simulations for integration over the random effects.
# To reduce computation time in the provided example only one fifth of the
# the original dataset is used and the confidence bands for the function
# R(t) are not computed as well as the power parameters associated with
# the random effects. Full example is commented thereafter.
# We first need to change the variable "statusS" which in the dataset
# encodes the indicator of a disease free survival event to an indicator
# of a time to relapse event (i.e., resurgence of cancer or
# onset of a second cancer) that excludes death as a composite event.
# Thus, the patients whose variables "timeS" and "timeT" are equal
# and whose variable "statusS" is equal to 1 will have
# "statuS" be set to 0. We do this because composite endpoint may not
# be appropriate in the setting of mediation analysis.
data(gastadj)
gastadj$timeS<-gastadj$timeS/365
gastadj$timeT<-gastadj$timeT/365
#here changing "statusS" to corresponds to a time to relapse event
gastadj[gastadj$timeS==gastadj$timeT & gastadj$statusS==1,c("statusS")]<-0
# select 20% of the original dataset
set.seed(1)
n<-nrow(gastadj)
subset<-gastadj[sort(sample(1:nrow(gastadj),round(n*0.2),replace = FALSE)),]
# Mediation model ('mediation=TRUE'). Computation takes around 17 minutes
mod.gast<-jointSurroPenal(subset,n.knots = 4,indicator.zeta = 0,
indicator.alpha = 0,mediation=TRUE,g.nknots=1,
rt.ntimes=30,rt.nmc=10000,rt.boot=FALSE)
summary(mod.gast)
plot(mod.gast)
# Example on the full dataset, including estimation of the power parameters
# mod.gast2<-jointSurroPenal(gastadj,n.knots = 4,mediation=TRUE,g.nknots=1,
# rt.ntimes=30,rt.nmc=10000,rt.boot=TRUE,
# rt.nboot=2000,rt.boot.nmc=10000)
# results
# plot(mod.gast2)
# summary(mod.gast2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.