# jointSurroCopPenal: Fit the one-step Joint frailty-copula model for evaluating a... In frailtypack: Shared, Joint (Generalized) Frailty Models; Surrogate Endpoints

## Description

Joint Frailty-Copula model for Surrogacy definition

Fit the one-step Joint frailty-copula surrogate model for the evaluation of a canditate surrogate endpoint, with different integration methods on the random effects, using a semiparametric penalized likelihood estimation. This approach extends that of Burzykowski `et al.` (2001) by including in the bivariate copula model the random effects treatment-by-trial interaction.

Assume Sij and Tij the failure times associated respectively with the surrogate and the true endpoints, for subject `j(j = 1,..., n`i`)` belonging to the trial `i (i = 1,..., G)`.

Let vi = (ui, vSi, vTi) be the vector of trial level random effects; ZS,ij = (ZSij1, ..., ZSijp)' and ZT,ij = (ZTij1, ..., ZTijp)' be covariates associated with Sij and Tij. The joint frailty-copula model is defined as follows:

where

and the conditional survival functions are given by

in which

ui ~ N(0,γ), ui ⊥ vSi, ui ⊥ vTi; (vSi, vTi)T ~ N(0,Σv)

with

In this model, λ0s(x) is the baseline hazard function associated with the surrogate endpoint and βS the fixed effects (or log-hazard ratio) corresponding to the covariates ZS,ij; λ0T(x) is the baseline hazard function associated with the true endpoint and βT the fixed treatment effects corresponding to the covariates ZT,ij. The copula model serves to consider dependence between the surrogate and true endpoints at the individual level. In the copula model, θ is the copula parameter used to quantify the strength of association. ui is a shared frailty effect associated with the baseline hazard function that serve to take into account the heterogeneity between trials of the baseline hazard function, associated with the fact that we have several trials in this meta-analytical design. The power parameter α distinguishes trial-level heterogeneity between the surrogate and the true endpoint. vSi and vTi are two correlated random effects treatment-by-trial interactions. ZSij1 or ZTij1 represents the treatment arm to which the patient has been randomized.

For simplicity, we focus on the Clayton and Gumbel-Hougaard copula functions. In Clayton's model, the copula function has the form

and in Gumbel's model, the copula function has the form

Surrogacy evaluation

We propose to base validation of a candidate surrogate endpoint on Kendall's τ at the individual level and coefficient of determination at the trial level, as in the classical approach (Burzykowski `et al.`, 2001). The formulations are given below.

Individual-level surrogacy

From the proposed model, according to the copula function, it can be shown that Kendall's τ is defined as :

where θ is the copula parameter. Kendall's τ is the difference between the probability of concordance and the probability of discordance of two realizations of Sij and Tij. It belongs to the interval [-1,1] and assumes a zero value when Sij and Tij are independent.

Trial-level surrogacy

The key motivation for validating a surrogate endpoint is to be able to predict the effect of treatment on the true endpoint, based on the observed effect of treatment on the surrogate endpoint. As shown by Buyse et al. (2000), the coefficenient of determination obtains from the covariance matrix Σv of the random effects treatment-by-trial interaction can be used to evaluate underlined prediction, and therefore as surrogacy evaluation measurement at trial-level. It is defined by:

The SEs of Rtrial2 and τ are calculated using the Delta-method. We also propose Rtrial2 and 95% CI computed using the parametric bootstrap. The use of delta-method can lead to confidence limits violating the [0,1], as noted by (Burzykowski et al., 2001). However, using other methods would not significantly alter the findings of the surrogacy assessment

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10``` ```jointSurroCopPenal(data, maxit = 40, indicator.alpha = 1, frail.base = 1, n.knots = 6, LIMparam = 0.001, LIMlogl = 0.001, LIMderiv = 0.001, nb.mc = 1000, nb.gh = 20, nb.gh2 = 32, adaptatif = 0, int.method = 0, nb.iterPGH = 5, true.init.val = 0, thetacopula.init = 1, sigma.ss.init = 0.5, sigma.tt.init = 0.5, sigma.st.init = 0.48, gamma.init = 0.5, alpha.init = 1, betas.init = 0.5, betat.init = 0.5, scale = 1, random.generator = 1, kappa.use = 4, random = 0, random.nb.sim = 0, seed = 0, init.kappa = NULL, ckappa = c(0,0), typecopula = 1, nb.decimal = 4, print.times = TRUE, print.iter = FALSE) ```

## Arguments

 `data` A `data.frame` containing at least seven variables entitled: `patientID:` A numeric, that represents the patient's identifier and must be unique; `trialID:` A numeric, that represents the trial in which each patient was randomized; `timeS:` The follow-up time associated with the surrogate endpoint; `statusS:` The event indicator associated with the surrogate endpoint. Normally 0 = no event, 1 = event; `timeT:` The follow-up time associated with the true endpoint; `statusT:` The event indicator associated with the true endpoint. Normally 0 = no event, 1 = event; `trt:` The treatment indicator for each patient, with 1 = treated, 0 = untreated. `maxit` maximum number of iterations for the Marquardt algorithm. The default being `40`. `indicator.alpha` A binary, indicating whether the power's parameter α should be estimated (1) or not (0). If `0`, α will be set to `1` during estimation. The default is 1. `frail.base` A binary, indicating whether the heterogeneity between trial on the baseline risk is considered (`1`) or not (`0`), using the shared cluster specific frailties ui . The default is `1`. `n.knots` integer giving the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (n.knots+2) splines functions for the approximation of the hazard or the survival functions. We estimate I or M-splines of order 4. When the user set a number of knots equals to k (n.knots=k) then the number of interior knots is (k-2) and the number of splines is (k-2)+order. Number of knots must be between 4 and 20. (See `frailtyPenal` for more details). `LIMparam` Convergence threshold of the Marquardt algorithm for the parameters, 10-3 by default (See `frailtyPenal` for more details). `LIMlogl` Convergence threshold of the Marquardt algorithm for the log-likelihood, 10-3 by default (See `frailtyPenal` for more details). `LIMderiv` Convergence threshold of the Marquardt algorithm for the gradient, 10-3 by default (See `frailtyPenal` for more details). `nb.mc` Number of samples considered in the Monte-Carlo integration. Required in the event `int.method` is equals to `0`, `2` or `4`. A value between 500 and 1000 most often gives good results. The default is `1000`. `nb.gh` Number of nodes for the Gaussian-Hermite quadrature. It can be chosen among 5, 7, 9, 12, 15, 20 and 32. The default is `20`. A value greater than or equals to `15` allowed good results in simulation studies `nb.gh2` Number of nodes for the Gauss-Hermite quadrature used to re-estimate the model, in the event of non-convergence, defined as previously. The default is `32`. `adaptatif` A binary, indicates whether the pseudo adaptive Gaussian-Hermite quadrature `(1)` or the classical Gaussian-Hermite quadrature `(0)` is used. The default is `0`. `int.method` A numeric, indicates the integration method: `0` for Monte carlo, `1` for Gaussian-Hermite quadrature, `3` for Laplace approximation. The default is `0`. `nb.iterPGH` Number of iterations before the re-estimation of the posterior random effects, in the event of the two-steps pseudo-adaptive Gaussian-hermite quadrature. If set to `0` there is no re-estimation". The default is `5`. `true.init.val` Numerical value. Indicates if the given initial values to parameters `(0)` should be considered. If set to `2`, α and γ are initialised using two separed shared frailty model (see `frailtyPenal` for more details); σ2vS, σ2vT and σvST are fixed by the user or the default values; θ, βS and βT are initialized using a classical joint frailty model, considering individual level random effects, with θ the variance of individual level random effects. If the joint frailty model is faced to convergence issues, βS and βT are initialized using two shared frailty models. In all other scenarios, if the simplified model does not converge, default given parameters values are used. Initial values for spline's associated parameters are fixed to `0.5`. The default for this argument is `0`. `thetacopula.init` Initial values for the copula parameter (θ), required if `true.init.val` is set to `0` or `2`. The default is `1`. `sigma.ss.init` Initial values for σ2vS, required if `true.init.val` is set to `0` or `2`. The default is `0.5`. `sigma.tt.init` Initial values for σ2vT, required if `true.init.val` is set to `0` or `2`. The default is `0.5`. `sigma.st.init` Initial values for σvST, required if `true.init.val` is set to `0` or `2`. The default is `0.48`. `gamma.init` Initial values for γ, required if `true.init.val` is set to `0` or `2`. The default is `0.5`. `alpha.init` Initial values for α, required if `true.init.val` is set to `0` or `2`. The default is `1`. `betas.init` Initial values for βS, required if `true.init.val` is set to `0` or `2`. The default is `0.5`. `betat.init` Initial values for βT, required if `true.init.val` is set to `0` or `2`. The default is `0.5`. `scale` A numeric that allows to rescale (by multiplication) the survival times, to avoid numerical problems in the event of some convergence issues. If no change is needed the argument is set to 1, the default value. eg: `1/365` aims to convert days to years ". `random.generator` Random number generator used by the Fortran compiler, `1` for the intrinsec subroutine `Random_number` and `2` for the subroutine `uniran()`. The default is `1`. In the event of convergence problem with `int.method` set to `0`, `2` or `4`, that requires integration by Monte-Carlo, user could change the random numbers generator. `kappa.use` A numeric, that indicates how to manage the smoothing parameters k1 and k2 in the event of convergence issues. If it is set to `1`, the given smoothing parameters or those obtained by cross-validation are used. If it is set to `3`, the associated smoothing parameters are successively divided by 10, in the event of convergence issues until 5 times. If it is set to `4`, the management of the smoothing parameter is as in the event `1`, follows by the successive division as described in the event `3` and preceded by the changing of the number of nodes for the Gauss-Hermite quadrature. The default is `4`. `random` A binary that says if we reset the random number generation with a different environment at each call `(1)` or not `(0)`. If it is set to `1`, we use the computer clock as seed. In the last case, it is not possible to reproduce the generated datasets". The default is `0`. Required if `random.generator` is set to 1. `random.nb.sim` If `random` is set to `1`, a binary that indicates the number of generations that will be made. `seed` The seed to use for data (or samples) generation. required if `random` is set to `0`. The default is `0`. `init.kappa` smoothing parameter used to penalized the log-likelihood. By default (init.kappa = NULL) the values used are obtain by cross-validation. `ckappa` Vector of two fixed values to add to the smoothing parameters. By default it is set to (0,0). this argument allows to well manage the smoothing parameters in the event of convergence issues. `typecopula` The copula function used, can be 1 for clayton or 2 for Gumbel-Hougaard. The default is `1` `nb.decimal` Number of decimal required for results presentation. `print.times` a logical parameter to print estimation time. Default is TRUE. `print.iter` a logical parameter to print iteration process. Default is FALSE.

## Details

The estimated parameter are obtained using the robust Marquardt algorithm (Marquardt, 1963) which is a combination between a Newton-Raphson algorithm and a steepest descent algorithm. The iterations are stopped when the difference between two consecutive log-likelihoods was small (< 10-3 ), the estimated coefficients were stable (consecutive values (< 10-3 )), and the gradient small enough (< 10-3 ), by default. Cubic M-splines of order 4 are used for the hazard function, and I-splines (integrated M-splines) are used for the cumulative hazard function.

The inverse of the Hessian matrix is the variance estimator and to deal with the positivity constraint of the variance component and the spline coefficients, a squared transformation is used and the standard errors are computed by the Δ-method (Knight & Xekalaki, 2000). The smooth parameter can be chosen by maximizing a likelihood cross validation criterion (Joly and other, 1998).

We proposed based on the joint surrogate model a new definition of the Kendall's τ. Moreover, distinct numerical integration methods are available to approximate the integrals in the marginal log-likelihood.

Non-convergence case management procedure

Special attention must be given to initializing model parameters, the choice of the number of spline knots, the smoothing parameters and the number of quadrature points to solve convergence issues. We first initialized parameters using the user's desired strategy, as specified by the option `true.init.val`. When numerical or convergence problems are encountered, with `kappa.use` set to `4`, the model is fitted again using a combination of the following strategies: vary the number of quadrature point (`nb.gh` to `nb.gh2` or `nb.gh2` to `nb.gh`) in the event of the use of the Gaussian Hermite quadrature integration (see `int.method`); divided or multiplied the smoothing parameters ( k1, k2) by 10 or 100 according to their preceding values, or used parameter vectors obtained during the last iteration (with a modification of the number of quadrature points and smoothing parameters). Using this strategy, we usually obtained during simulation the rejection rate less than 3%. A sensitivity analysis was conducted without this strategy, and similar results were obtained on the converged samples, with about a 23% rejection rate.

## Value

This function return an object of class jointSurroPenal with elements :

 `EPS` A vector containing the obtained convergence thresholds with the Marquardt algorithm, for the parameters, the log-likelihood and for the gradient; `b` A vector containing estimates for the splines parameter's; elements of the lower triangular matrix (L) from the Cholesky decomposition such that Σ = LLT, with Σ the covariance of the random effects (vSi,vTi); the coefficient α (if `indicator.alpha` is set to `1`); the satandard error of the random effect `u`i; the logarithm of the copula parameter (θ) if the Clayton copula function is considered, or the squared root of θ if the Gumbel copula is considered. The last two parameters represent the regression coefficients βS and βT; `varH` The variance matrix of all parameters in `b` (before positivity constraint transformation for the variance of the measurement error, for which the delta method is used); `varHIH` The robust estimation of the variance matrix of all parameters in `b`; `loglikPenal` The complete marginal penalized log-likelihood; `LCV` the approximated likelihood cross-validation criterion in the semiparametric case (with `H` minus the converged Hessian matrix, and `l(.)` the full log-likelihood). `xS` vector of times for surrogate endpoint where both survival and hazard function are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times; `lamS` array (dim = 3) of hazard estimates and confidence bands, for surrogate endpoint; `survS` array (dim = 3) of baseline survival estimates and confidence bands, for surrogate endpoint; `xT` vector of times for true endpoint where both survival and hazard function are estimated. By default seq(0, max(time), length = 99), where time is the vector of survival times; `lamT` array (dim = 3) of hazard estimates and confidence bands, for true endpoint; `survT` array (dim = 3) of baseline survival estimates and confidence bands, for true endpoint; `n.iter` number of iterations needed to converge; `theta` Estimate for θ; `gamma` Estimate for γ; `alpha` Estimate for α; `zeta` A value equals to `1`, no really use in this function; `sigma.s` Estimate for σ2vS; `sigma.t` Estimate for σ2vT; `sigma.st` Estimate for σvST; `beta.s` Estimate for βS; `beta.t` Estimate for βT; `ui` A binary, that indicates if the heterogeneity between trial on the baseline risk has been Considered (`1`), using the shared cluster specific frailties (`u`i), or not (`0`); `ktau` The Kendall's τ with the correspondant 95 \% CI obtained from the delta-method; `R2.boot` The `R`2trial with the correspondant 95 \% CI obtained from the parametric bootstrap; `Coefficients` The estimates with the corresponding standard errors and the 95 \% CI `kappa` Positive smoothing parameters used for convergence. These values could be different to initial values if `kappa.use` is set to `3` or `4`; `scale` The value used to rescale the survival times `data` The dataset used in the model `varcov.Sigma` Covariance matrix of the estimates of the estimates of (σ2vS,σ2vT, σvST) obtained from the delta-method `parameter` List of all arguments used in the model `type.joint` A code `3` that represents the joint frailty-copula model. This output is used in other functions

## Author(s)

Casimir Ledoux Sofeu casimir.sofeu@u-bordeaux.fr, scl.ledoux@gmail.com and Virginie Rondeau virginie.rondeau@inserm.fr

## References

Burzykowski, T., Molenberghs, G., Buyse, M., Geys, H., and Renard, D. (2001). Validation of surrogate end points in multiple randomized clinical trials with failure time end points. Journal of the Royal Statistical Society: Series C (Applied Statistics) 50, 405-422.

Buyse, M., Molenberghs, G., Burzykowski, T., Renard, D., and Geys, H. (2000). The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 1, 49-67

Sofeu, C. L., Emura, T., and Rondeau, V. (2019). One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine 38, 2928-2942.

R. B. Nelsen. An introduction to Copulas. Springer, 2006

Prenen, L., Braekers, R., and Duchateau, L. (2017). Extending the archimedean copula methodology to model multivariate survival data grouped in clusters of variable size. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 483-505.

Sofeu, C. L., Emura, T., and Rondeau, V. (2020). A joint frailty-copula model for meta-analytic validation of failure time surrogate endpoints in clinical trials. `Under review`

`jointSurrCopSimul`, `summary.jointSurroPenal`, `jointSurroPenal`, `jointSurroPenalSimul`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```## Not run: # Data from the advanced ovarian cancer randomized clinical trials. data(dataOvarian) joint.surro.Gumbel <- jointSurroCopPenal(data = dataOvarian, int.method = 0, n.knots = 8, maxit = 50, kappa.use = 4, nb.mc = 1000, typecopula = 2, print.iter = F, scale = 1/365) print(joint.surro.Gumbel) joint.surro.Clayton <- jointSurroCopPenal(data = dataOvarian, int.method = 0, n.knots = 8, maxit = 50, kappa.use = 4, nb.mc = 1000, typecopula = 1, print.iter = F, scale = 1/365) print(joint.surro.Clayton) ## End(Not run) ```