jointSurroPenal: Fit the one-step Joint surrogate model for evaluating a...

View source: R/jointSurroPenal.R

jointSurroPenalR Documentation

Fit the one-step Joint surrogate model for evaluating a canditate surrogate endpoint

Description

Joint Frailty Surrogate model definition

Fit the one-step Joint surrogate model for the evaluation of a canditate surrogate endpoint, with different integration methods on the random effects, using a semiparametric penalized likelihood estimation. This approach extends that of Burzykowski et al. (2001) by including in the same joint frailty model the individual-level and the trial-level random effects. This function can also be used for mediation analysis where a direct effect of the surrogate time S on the final endpoint T is allowed through a function g(S).

For the jth subject (j=1,...,ni) of the ith trial (i=1,...,G), the joint surrogate model is defined as follows:

surromodel1.png

where, \omegaij ~ N(0,\theta), ui ~ N(0,\gamma), \omegai ⊥ ui, ui ⊥ vSi, ui ⊥ vTi

and (vSi,vTi)T ~ N(0,\Sigmav)

with

surromodel2.png

In this model, \lambda0s(t) is the baseline hazard function associated with the surrogate endpoint and \betaS the fixed treatment effect (or log-hazard ratio); \lambda0T(t) is the baseline hazard function associated with the true endpoint and \betaT the fixed treatment effect. \omegaij is a shared individual-level frailty that serve to take into account the heterogeneity in the data at the individual level; ui is a shared frailty effect associated with the baseline hazard function that serve to take into account the heterogeneity between trials of the baseline hazard function, associated with the fact that we have several trials in this meta-analytical design. The power parameters \zeta and \alpha distinguish both individual and trial-level heterogeneities between the surrogate and the true endpoint. vSi and vTi are two correlated random effects treatment-by-trial interactions. Zij1 represents the treatment arm to which the patient has been randomized. In the mediation analysis setting, the hazard function for the true endpoint becomes:

jointsurromed.png

where the term I(Sij≤ t)g(Sij) allows for a direct effect of the surrogate time S on the risk of occurrence of the final endpoint T.

Surrogacy evaluation

We proposed new definitions of Kendall's \tau and coefficient of determination as individual-level and trial-level association measurements, to evaluate a candidate surrogate endpoint (Sofeu et al., 2018). For the surrogacy in the mediation analysis setting see the "Surrogacy through mediation" Section.

Individual-level surrogacy

To measure the strength of association between Sij and Tij after adjusting the marginal distributions for the trial and the treatment effects, as show in Sofeu et al.(2018), we use the Kendall's \tau define by :

surromodel3.png

where \theta, \zeta, \alpha and \gamma are estimated using the joint surrogate model defined previously. Kendall's \tau is the difference between the probability of concordance and the probability of discordance of two realizations of Sij and Tij. It belongs to the interval [-1,1] and assumes a zero value when Sij and Tij are independent. We estimate Kendall's \tau using Monte-Carlo or Gaussian Hermite quadrature integration methods. Its confidence interval is estimated using parametric bootstrap

Trial-level surrogacy

The key motivation for validating a surrogate endpoint is to be able to predict the effect of treatment on the true endpoint, based on the observed effect of treatment on the surrogate endpoint. As shown by Buyse et al. (2000), the coefficenient of determination obtains from the covariance matrix \Sigmav of the random effects treatment-by-trial interaction can be used to evaluate underlined prediction, and therefore as surrogacy evaluation measurement at trial-level. It is defined by:

surromodel4.png

The SEs of Rtrial2 is calculated using the Delta-method. We also propose Rtrial2 and 95% CI computed using the parametric bootstrap. The use of delta-method can lead to confidence limits violating the [0,1], as noted by (Burzykowski et al., 2001). However, using other methods would not significantly alter the findings of the surrogacy assessment

Surrogacy through mediation

In the mediation analysis setting, the surrogacy measure is the proportion of treatment effect on the final endpoint T that goes through its effect on the surrogate S. This measure is a time-dependent function R(t) defined as:

rt.png

where NIE and TE stand for "natual indirect effect" and "total effect" respectively. The numerator is the difference of the survival function of T for a subject whose treatment has been set to 1 (experimental arm) for both S and T versus a subject for which the treatment for T is still 1 but is set 0 for S. This corresponds to the indirect effect (in ther of survival probability) of the treatment on T through S. The denominator is the total effect of the treatment on T.

Usage

jointSurroPenal(data, maxit=50, indicator.zeta = 1,
   indicator.alpha = 1, frail.base = 1, n.knots = 6,
   LIMparam = 0.001, LIMlogl = 0.001, LIMderiv = 0.001,
   nb.mc = 300, nb.gh = 32, nb.gh2 = 20, adaptatif = 0,
   int.method = 2, nb.iterPGH = 5, nb.MC.kendall = 10000,
   nboot.kendall = 1000, true.init.val = 0,
   theta.init = 1, sigma.ss.init = 0.5, sigma.tt.init = 0.5,
   sigma.st.init = 0.48, gamma.init = 0.5, alpha.init = 1,
   zeta.init = 1, betas.init = 0.5, betat.init = 0.5, scale = 1,
   random.generator = 1, kappa.use = 4, random = 0,
   random.nb.sim = 0, seed = 0, init.kappa = NULL, ckappa = c(0,0),
   nb.decimal = 4, print.times = TRUE, print.iter=FALSE,mediation=FALSE,
   g.nknots=1,rt.times=NULL,rt.ntimes=NULL,rt.nmc=500,rt.boot=FALSE,
   rt.nboot=2000,rt.boot.nmc=500,rt.integ.type=2)

Arguments

data

A data.frame containing at least seven variables entitled:

  • patientID: A numeric, that represents the patient's identifier and must be unique;

  • trialID: A numeric, that represents the trial in which each patient was randomized;

  • timeS: The follow-up time associated with the surrogate endpoint;

  • statusS: The event indicator associated with the surrogate endpoint. Normally 0 = no event, 1 = event;

  • timeT: The follow-up time associated with the true endpoint;

  • statusT: The event indicator associated with the true endpoint. Normally 0 = no event, 1 = event;

  • trt: The treatment indicator for each patient, with 1 = treated, 0 = untreated.

maxit

maximum number of iterations for the Marquardt algorithm. The default being 40.

indicator.zeta

A binary, indicates whether the power's parameter \zeta should be estimated (1) or not (0). If 0, \zeta will be set to 1 during estimation. The default is 1. This parameter can be set to 0 in the event of convergence and identification issues.

indicator.alpha

A binary, indicating whether the power's parameter \alpha should be estimated (1) or not (0). If 0, \alpha will be set to 1 during estimation. The default is 1.

frail.base

A binary, indicating whether the heterogeneity between trial on the baseline risk is considered (1) or not (0), using the shared cluster specific frailties ui . The default is 1.

n.knots

integer giving the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (n.knots+2) splines functions for the approximation of the hazard or the survival functions. We estimate I or M-splines of order 4. When the user set a number of knots equals to k (n.knots=k) then the number of interior knots is (k-2) and the number of splines is (k-2)+order. Number of knots must be between 4 and 20. (See frailtyPenal for more details).

LIMparam

Convergence threshold of the Marquardt algorithm for the parameters, 10-3 by default (See frailtyPenal for more details).

LIMlogl

Convergence threshold of the Marquardt algorithm for the log-likelihood, 10-3 by default (See frailtyPenal for more details).

LIMderiv

Convergence threshold of the Marquardt algorithm for the gradient, 10-3 by default (See frailtyPenal for more details).

nb.mc

Number of samples considered in the Monte-Carlo integration. Required in the event int.method is equals to 0, 2 or 4. A value between 100 and 300 most often gives good results. However, beyond 300, the program takes a lot of time to estimate the parameters. The default is 300.

nb.gh

Number of nodes for the Gaussian-Hermite quadrature. It can be chosen among 5, 7, 9, 12, 15, 20 and 32. The default is 32.

nb.gh2

Number of nodes for the Gauss-Hermite quadrature used to re-estimate the model, in the event of non-convergence, defined as previously. The default is 20.

adaptatif

A binary, indicates whether the pseudo adaptive Gaussian-Hermite quadrature (1) or the classical Gaussian-Hermite quadrature (0) is used. The default is 0.

int.method

A numeric, indicates the integration method: 0 for Monte carlo, 1 for Gaussian-Hermite quadrature, 2 for a combination of both Gaussian-Hermite quadrature to integrate over the individual-level random effects and Monte carlo to integrate over the trial-level random effects, 4 for a combination of both Monte carlo to integrate over the individual-level random effects and Gaussian-Hermite quadrature to integrate over the trial-level random effects. The default is 2.

nb.iterPGH

Number of iterations before the re-estimation of the posterior random effects, in the event of the two-steps pseudo-adaptive Gaussian-hermite quadrature. If set to 0 there is no re-estimation". The default is 5.

nb.MC.kendall

Number of generated points used with the Monte-Carlo to estimate integrals in the Kendall's \tau formulation. Beter to use at least 4000 points for stable reseults. The default is 10000.

nboot.kendall

Number of samples considered in the parametric bootstrap to estimate the confidence interval of the Kendall's \tau. The default is 1000.

true.init.val

Numerical value. Indicates if the given initial values to parameters (0) should be considered. If set to 2, \alpha and \gamma are initialised using two separed shared frailty model (see frailtyPenal for more details); \sigma2vS, \sigma2vT and \sigmavST are fixed by the user or the default values; \zeta, \theta, \betaS and \betaT are initialized using a classical joint frailty model, considering individual level random effects. If the joint frailty model is faced to convergence issues, \betaS and \betaT are initialized using two shared frailty models. In all other scenarios, if the simplified model does not converge, default given parameters values are used. Initial values for spline's associated parameters are fixed to 0.5. The default for this argument is 0.

theta.init

Initial values for \theta, required if true.init.val is set to 0 or 2. The default is 1.

sigma.ss.init

Initial values for \sigma2vS, required if true.init.val is set to 0 or 2. The default is 0.5.

sigma.tt.init

Initial values for \sigma2vT, required if true.init.val is set to 0 or 2. The default is 0.5.

sigma.st.init

Initial values for \sigmavST, required if true.init.val is set to 0 or 2. The default is 0.48.

gamma.init

Initial values for \gamma, required if true.init.val is set to 0 or 2. The default is 0.5.

alpha.init

Initial values for \alpha, required if true.init.val is set to 0 or 2. The default is 1.

zeta.init

Initial values for \zeta, required if true.init.val is set to 0 or 2. The default is 1.

betas.init

Initial values for \betaS, required if true.init.val is set to 0 or 2. The default is 0.5.

betat.init

Initial values for \betaT, required if true.init.val is set to 0 or 2. The default is 0.5.

scale

A numeric that allows to rescale (multiplication) the survival times, to avoid numerical problems in the event of some convergence issues. If no change is needed the argument is set to 1, the default value. eg: 1/365 aims to convert days to years ".

random.generator

Random number generator used by the Fortran compiler, 1 for the intrinsec subroutine Random_number and 2 for the subroutine uniran(). The default is 1. in the event of convergence problem with int.method set to 0, 2 or 4, that requires integration by Monte-Carlo, user could change the random numbers generator.

kappa.use

A numeric, that indicates how to manage the smoothing parameters k1 and k2 in the event of convergence issues. If it is set to 1, the given smoothing parameters or those obtained by cross-validation are used. If it is set to 3, the associated smoothing parameters are successively divided by 10, in the event of convergence issues until 5 times. If it is set to 4, the management of the smoothing parameter is as in the event 1, follows by the successive division as described in the event 3 and preceded by the changing of the number of nodes for the Gauss-Hermite quadrature. The default is 4.

random

A binary that says if we reset the random number generation with a different environment at each call (1) or not (0). If it is set to 1, we use the computer clock as seed. In the last case, it is not possible to reproduce the generated datasets. The default is 0. Required if random.generator is set to 1.

random.nb.sim

If random is set to 1, a binary that indicates the number of generations that will be made.

seed

The seed to use for data (or samples) generation. required if random is set to 0. The default is 0.

init.kappa

smoothing parameter used to penalized the log-likelihood. By default (init.kappa = NULL) the values used are obtain by cross-validation.

ckappa

Vector of two fixed values to add to the smoothing parameters. By default it is set to (0,0). this argument allows to well manage the smoothing parameters in the event of convergence issues.

nb.decimal

Number of decimal required for results presentation.

print.times

a logical parameter to print estimation time. Default is TRUE.

print.iter

a logical parameter to print iteration process. Default is FALSE.

mediation

a logical value indicating if the mediation analysis method is used. Default is FALSE.

g.nknots

In the case of a mediation analysis, indicates how many inner knots are used in the splines basis for estimating the function g(s). The value of g.nknots should be between 1 and 5. Default is 1.

rt.times

In the mediation analysis setting, a vector of times for which the funtion R(t) is evaluated. Specified time points must be in the range of the observed event times. The length of the vector should be less than 200.

rt.ntimes

In the mediation setting, if the argument rt.times is not specified the argument rt.ntimes allows the user to only specify a number of time points for which the function R(t) has to be computed. This argument is only to be used if rt.times is not specified. In that case the default value for rt.ntimes is 10. The value of rt.ntimes should be less than 200.

rt.nmc

An integer indicating how many Monte Carlo simulations are used to integrate over the random effects in the computation of the function R(t). in the mediation analysis setting. Default is 500.

rt.boot

A logical value indicating if bootstrapped confidence bands needs to be computed for the function R(t) in the mediation analysis setting. Default is FALSE.

rt.nboot

An integer indicating how many bootstrapped replicates of R(t) needs to be computed to derive confidence bands for R(t). Default is 2000.

rt.boot.nmc

If rt.boot is TRUE, indicates how many Monte Carlo simulations are used to integrate over the random effects in the bootstrapped functions R(t) in the mediation analysis setting. Default is 500

rt.integ.type

An integer indicating which type of integration over the distribution of S should be used in the computation of the function R(t). If set to 1, a simple trapezoidal rule is used with 300 integration points. If set to 2 a Gauss-Laguerre quadrature is used with 30 knots. Default is 2.

Details

The estimated parameter are obtained using the robust Marquardt algorithm (Marquardt, 1963) which is a combination between a Newton-Raphson algorithm and a steepest descent algorithm. The iterations are stopped when the difference between two consecutive log-likelihoods was small (< 10-3 ), the estimated coefficients were stable (consecutive values (< 10-3 )), and the gradient small enough (< 10-3 ), by default. Cubic M-splines of order 4 are used for the hazard function, and I-splines (integrated M-splines) are used for the cumulative hazard function.

The inverse of the Hessian matrix is the variance estimator and to deal with the positivity constraint of the variance component and the spline coefficients, a squared transformation is used and the standard errors are computed by the \Delta-method (Knight & Xekalaki, 2000). The smooth parameter can be chosen by maximizing a likelihood cross validation criterion (Joly and other, 1998).

We proposed based on the joint surrogate model a new definition of the Kendall's \tau. Moreover, distinct numerical integration methods are available to approximate the integrals in the marginal log-likelihood.

Non-convergence case management procedure

Special attention must be given to initializing model parameters, the choice of the number of spline knots, the smoothing parameters and the number of quadrature points to solve convergence issues. We first initialized parameters using the user's desired strategy, as specified by the option true.init.val. When numerical or convergence problems are encountered, with kappa.use set to 4, the model is fitted again using a combination of the following strategies: vary the number of quadrature point (nb.gh to nb.gh2 or nb.gh2 to nb.gh) in the event of the use of the Gaussian Hermite quadrature integration (see int.method); divided or multiplied the smoothing parameters ( k1, k2) by 10 or 100 according to their preceding values, or used parameter vectors obtained during the last iteration (with a modification of the number of quadrature points and smoothing parameters). Using this strategy, we usually obtained during simulation the rejection rate less than 3%. A sensitivity analysis was conducted without this strategy, and similar results were obtained on the converged samples, with about a 23% rejection rate.

Value

This function return an object of class jointSurroPenal or jointSurroMed in the mediation analysis setting with elements:

EPS

A vector containing the obtained convergence thresholds with the Marquardt algorithm, for the parameters, the log-likelihood and for the gradient;

b

A vector containing estimates for the splines parameter's; the power's parameter \zeta (if indicator.zeta is set to 1), the standard error of the shared individual-level frailty \omegaij (\theta),elements of the lower triangular matrix (L) from the Cholesky decomposition such that \Sigma = LLT, with \Sigma the covariance of the random effects (vSi,vTi); the coefficient \alpha (if indicator.alpha is set to 1); the satandard error of the random effect uiand the regression coefficients \betaS and \betaT;

varH

The variance matrix of all parameters in b (before positivity constraint transformation for the variance of the measurement error, for which the delta method is used);

varHIH

The robust estimation of the variance matrix of all parameters in b;

loglikPenal

The complete marginal penalized log-likelihood;

LCV

the approximated likelihood cross-validation criterion in the semiparametric case (with H minus the converged Hessian matrix, and l(.) the full log-likelihood). lcv.png

xS

vector of times for surrogate endpoint where both survirt.nmc.bootval and hazard function are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times;

lamS

array (dim = 3) of hazard estimates and confidence bands, for surrogate endpoint;

survS

array (dim = 3) of baseline survival estimates and confidence bands, for surrogate endpoint;

xT

vector of times for true endpoint where both survival and hazard function are estimated. By default seq(0, max(time), length = 99), where time is the vector of survival times;

lamT

array (dim = 3) of hazard estimates and confidence bands, for true endpoint;

survT

array (dim = 3) of baseline survival estimates and confidence bands, for true endpoint;

n.iter

number of iterations needed to converge;

theta

Estimate for \theta;

gamma

Estimate for \gamma;

alpha

Estimate for \alpha;

zeta

Estimate for \zeta;

sigma.s

Estimate for \sigma2vS;

sigma.t

Estimate for \sigma2vT;

sigma.st

Estimate for \sigmavST;

beta.s

Estimate for \betaS;

beta.t

Estimate for \betaT;

ui

A binary, that indicates if the heterogeneity between trial on the baseline risk has been Considered (1), using the shared cluster specific frailties (ui), or not (0);

ktau

The Kendall's \tau with the correspondant 95 \% CI computed using the parametric bootstrap;

R2.boot

The R2trial with the correspondant 95 \% CI computed using the parametric bootstrap;

Coefficients

The estimates with the corresponding standard errors and the 95 \% CI

kappa

Positive smoothing parameters used for convergence. These values could be different to initial values if kappa.use is set to 3 or 4;

scale

The value used to rescale the survival times

data

The dataset used in the model

varcov.Sigma

covariance matrix of the estimates of (\sigma2vS,\sigma2vT, \sigmavST) obtained from the delta-method

parameter

list of all arguments used in the model

mediation

List returned in the case where the option mediation is set to TRUE which contains:

  • data.rt: A dataframe containing estimated values for the funtion R(t) and the natural effects for differents time points

  • g.knots: The vector of knots used in the spline basis for the function g.

  • g.order: The order of the spline basis used to estimate the function g.

  • g.coefficients: A vector containing the estimated coefficients associated with the splines in the estimation of the function g.

  • data.g: A dataframe containing the values of the estimated function g computed at several time points and the associated 95

  • Rt.ci: A dataframe containing the 95

  • TE.ci: A dataframe containing the 95

  • NDE.ci: A dataframe containing the 95

  • NIE.ci: A dataframe containing the 95

Author(s)

Casimir Ledoux Sofeu casimir.sofeu@u-bordeaux.fr, scl.ledoux@gmail.com, Quentin Le Coent quentin.le-coent@u-bordeaux.fr and Virginie Rondeau virginie.rondeau@inserm.fr,

References

Burzykowski, T., Molenberghs, G., Buyse, M., Geys, H., and Renard, D. (2001). Validation of surrogate end points in multiple randomized clinical trials with failure time end points. Journal of the Royal Statistical Society: Series C (Applied Statistics) 50, 405-422.

Buyse, M., Molenberghs, G., Burzykowski, T., Renard, D., and Geys, H. (2000). The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 1, 49-67

Sofeu, C. L., Emura, T., and Rondeau, V. (2019). One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine 38, 2928-2942.

Le Coent, Q., Legrand, C., Rondeau, V. (2021). Time-to-event surrogate endpoint validation using mediation and meta-analytic data. Article submitted.

See Also

jointSurrSimul, summary.jointSurroPenal, jointSurroPenalSimul

Examples



# Generation of data to use
data.sim <- jointSurrSimul(n.obs=600, n.trial = 30,cens.adm=549.24,
         alpha = 1.5, theta = 3.5, gamma = 2.5, zeta = 1, sigma.s = 0.7,
         sigma.t = 0.7, cor = 0.8, betas = -1.25, betat = -1.25,
         full.data = 0, random.generator = 1, seed = 0, nb.reject.data = 0)


#Surrogacy evaluation based on generated data with a combination of Monte Carlo
#and classical Gaussian Hermite integration.*
# (Computation takes around 5 minutes)

joint.surro.sim.MCGH <- jointSurroPenal(data = data.sim, int.method = 2,
                   nb.mc = 300, nb.gh = 20)

#Surrogacy evaluation based on generated data with a combination of Monte Carlo
# and Pseudo-adaptive Gaussian Hermite integration.
# (Computation takes around 4 minutes)

joint.surro.sim.MCPGH <- jointSurroPenal(data = data.sim, int.method = 2,
                   nb.mc = 300, nb.gh = 20, adaptatif = 1)

# Results
summary(joint.surro.sim.MCGH)
summary(joint.surro.sim.MCPGH)

# Data from the advanced ovarian cancer randomized clinical trials.
# Joint surrogate model with \eqn{\zeta} fixed to 1, 8 nodes spline
# and the rescaled survival time.

data(dataOvarian)
# (Computation takes around 20 minutes)

joint.surro.ovar <- jointSurroPenal(data = dataOvarian, n.knots = 8,
                init.kappa = c(2000,1000), indicator.alpha = 0, nb.mc = 200,
                scale = 1/365)
# results
summary(joint.surro.ovar)

print(joint.surro.ovar)


# Mediation analysis on the adjuvant chemotherapy
# dataset where the surrogate is a time-to-relapse and the final endpoint is death.
# 4 knots are used to estimate the two baseline hazard functions.
# The function g(s) is estimated using cubic b-splines with 1 interior
# knot ('g.nkots=1'). The function R(t) is computed at 100 time points
# using 10.000 Monte Carlo simulations for integration over the random effects.
# To reduce computation time in the provided example only one fifth of the
# the original dataset is used and the confidence bands for the function
# R(t) are not computed as well as the power parameters associated with
# the random effects. Full example is commented thereafter.

# We first need to change the variable "statusS" which in the dataset
# encodes the indicator of a disease free survival event to an indicator
# of a time to relapse event (i.e., resurgence of cancer or
# onset of a second cancer) that excludes death as a composite event.
# Thus, the patients whose variables "timeS" and "timeT" are equal
# and whose variable "statusS" is equal to 1 will have
# "statuS" be set to 0. We do this because composite endpoint may not
# be appropriate in the setting of mediation analysis.

data(gastadj)
gastadj$timeS<-gastadj$timeS/365
gastadj$timeT<-gastadj$timeT/365
#here changing "statusS" to corresponds to a time to relapse event
gastadj[gastadj$timeS==gastadj$timeT & gastadj$statusS==1,c("statusS")]<-0

# select 20% of the original dataset
set.seed(1)
n<-nrow(gastadj)
subset<-gastadj[sort(sample(1:nrow(gastadj),round(n*0.2),replace = F)),]

# Mediation model ('mediation=T'). Computation takes around 17 minutes
mod.gast<-jointSurroPenal(subset,n.knots = 4,indicator.zeta = 0,
                         indicator.alpha = 0,mediation=TRUE,g.nknots=1,
                         rt.ntimes=30,rt.nmc=10000,rt.boot=FALSE)

summary(mod.gast)
plot(mod.gast)

# Example on the full dataset, including estimation of the power parameters
# mod.gast2<-jointSurroPenal(gastadj,n.knots = 4,mediation=TRUE,g.nknots=1,
#                            rt.ntimes=30,rt.nmc=10000,rt.boot=T,
#                            rt.nboot=2000,rt.boot.nmc=10000)

# results
# plot(mod.gast2)
# summary(mod.gast2)


frailtypack documentation built on Nov. 25, 2023, 9:06 a.m.