mexhaz: mexhaz function In mexhaz: Mixed Effect Excess Hazard Models

Description

Fit an (excess) hazard regression model using different shapes for the baseline hazard (Weibull, piecewise constant, exponential of a B-spline of degree 1 to 3, exponential of a restricted cubic spline), with the possibility to include time-dependent effects of variable(s) and a random effect defined at the cluster level. The function accepts right-censored and counting process input styles for the follow-up time. The latter allows the modelling of survival data with delayed entries. The time-dependent effect of a covariable is modelled by adding interaction terms between the covariable and a function of time of the same class as the one used for the baseline hazard (in particular, with the same knots for piecewise constant hazards; and with the same degree and the same knots for B-spline or restricted cubic spline functions). The random effect is assumed to be normally distributed with mean 0 and standard deviation sigma. The optimisation process uses adaptive Gaussian quadrature to calculate the cluster-specific marginal likelihoods. The logarithm of the full marginal likelihood, defined as the sum of the logarithms of the cluster-specific marginal likelihoods, is then maximised using an optimisation routine such as `nlm` (default) or `optim`.

Usage

 ```1 2 3 4 5``` ```mexhaz(formula, data, expected = NULL, base = c("weibull", "exp.bs", "exp.ns", "pw.cst"), degree = 3, knots = NULL, bound = NULL, n.gleg = 20, init = NULL, random = NULL, n.aghq = 10, fnoptim = c("nlm", "optim"), verbose = 100, method = "Nelder-Mead", iterlim=10000, numHess=FALSE, print.level=1,...) ```

Arguments

 `formula` a formula object, with the response on the left of the `~` operator, and the linear predictor on the right. The response must be of the form `Surv(time, event)` for right censored data or `Surv(time, time2, event)` for counting process style data. The linear predictor accepts a special instruction `nph()` for specifying variables for which a time-dependent effect should be modelled (if several variables are modelled with time-dependent effects, separate these variables inside the `nph()` instruction with a `+` sign). In case `time` takes the value 0 for some observations when data are entered in the right censored style, it is assumed that these observations refer to events/censoring that occurred on the first day of follow-up. Consequently, a value of 1/730.5 (half a day) is substituted in order to make computations possible. However, it should be stressed that this is just a convention and that it does not make much sense if the time scale is not expressed in years. We therefore advise the analyst to deal with 0 time values during the dataset preparation stage. `data` a `data.frame` containing the variables referred to in the `formula`, as well as in the `expected` and `random` arguments if these arguments are used. `expected` name of the variable (must be given in quotes) representing the population (i.e., expected) hazard. By default, `expected=NULL`, which means that the function estimates the overall hazard (and not the excess hazard). `base` functional form that should be used to model the baseline hazard. Selection can be made between the following options: `"weibull"` for a Weibull hazard, `"exp.bs"` for a hazard described by the exponential of a B-spline (only B-splines of degree 1, 2 or 3 are accepted), `"exp.ns"` for a hazard described by the exponential of a restricted cubic spline (also called 'natural spline'), `"pw.cst"` for a piecewise constant hazard. By default, `base="weibull"`. For the Weibull hazard model, the cumulative hazard is given by the following relationship: H(t,x,z) = lambda*exp(x'b)*t^(rho*exp(z'g)) where lambda and rho are the parameters of the Weibull baseline hazard, x represent variables with proportional effect (with corresponding regression coefficients 'b') and z represent variables with time-dependent effects (with corresponding regression coefficients 'g'). The `mexhaz()` function does not estimate directly lambda and rho but their logarithms (in the output of the function, these are named respectively 'logLambda' and 'logRho'). For the spline-based hazards, it should be noted that the B-spline and restricted cubic spline bases created internally in the `mexhaz()` function are identical to those obtained by the use of, respectively, the `bs()` and `ns()` functions from the `splines` package. `degree` if `base="exp.bs"`, `degree` represents the degree of the B-spline used. Only integer values between 1 and 3 are accepted, and 3 is the default. `knots` if `base="exp.bs"` or `"exp.ns"`, `knots` is the vector of interior knots of the spline. If `base="pw.cst"`, `knots` is the vector defining the endpoints of the time intervals on which the hazard is assumed to be constant. By default, `knots=NULL` (that is, it produces a B-spline with no interior knots if `base="exp.bs"`, a linear B-spline with no interior knots if `base="exp.ns"`, or a constant hazard over the whole follow-up period if `base="pw.cst"`). `bound` a vector of two numerical values corresponding to the boundary knots of the spline functions. If `base="exp.bs"` or `base="exp.ns"`, computation of the B-spline basis requires that boundary knots be given. The `bound` argument allows the user to specify these boundary knots. If `base="exp.bs"`, the interval defined by the boundary knots must at least include the interval `c(0,max(time))` (otherwise, there could be problems with ill-conditioned bases). If `base="exp.ns"`, the boundary knots correspond to the knots beyond which the spline is contrained to be linear (in that case, the boundary knots can be values contained in `c(0,max(time))`). By default, the boundary knots are set to `c(0,max(time))`. `n.gleg` if `base="exp.bs"` and degree is equal to 2 or 3, or if `base="exp.ns"`, the cumulative hazard is computed via Gauss-Legendre quadrature and `n.gleg` is the number of quadrature nodes to be used to compute the cumulative hazard. By default, `n.gleg=20`. `init` vector of initial values. By default `init=NULL` and the initial values are internally set to the following values: for the baseline hazard: if `base="weibull"`, the scale and shape parameters are set to log(0.1); if `base="exp.bs"`, the parameters of the B-spline are all set to -1; if `base="exp.ns"`, the parameters of the restricted cubic spline are all set to -1; if `base="pw.cst"`, the logarithm of the piecewise-constant hazards are set to -1; the parameters describing the effects of the covariables are all set to 0; the parameter representing the standard deviation of the random effect is set to 0.1. `random` name of the variable to be entered as a random effect (must be given between quotes), representing the cluster membership. By default, `random=NULL` which means that the function fits a fixed effects model. `n.aghq` number of quadrature points to be used for estimating the cluster-specific marginal likelihoods by adaptive Gauss-Hermite quadrature. By default, `n.aghq=10`. `fnoptim` name of the R optimisation procedure used to maximise the likelihood. Selection can be made between `"nlm"` (by default) and `"optim"`. `verbose` integer parameter representing the frequency at which the current state of the optimisation process is displayed. Internally, an 'evaluation' is defined as an estimation of the log-likelihood for a given vector of parameters. This means that the number of evaluations is increased each time the optimisation procedure updates the value of any of the parameters to be estimated. If `verbose=n` (with `n` an integer), the function will display the current values of the parameters, the log-likelihood and the time elapsed every `n` evaluations. If `verbose=0`, nothing is displayed. `method` if `fnoptim="optim"`, `method` represents the optimisation method to be used by `optim`. By default, `method="Nelder-Mead"`. This parameter is not used if `fnoptim="nlm"`. `iterlim` if `fnoptim="nlm"`, `iterlim` represents the maximum number of iterations before the `nlm` optimisation procedure is terminated. By default, `iterlim` is set to 10000. This parameter is not used if `fnoptim="optim"` (in this case, the maximum number of iterations must be given as part of a list of control parameters via the `control` argument: see the help page of `optim` for further details). `numHess` logical value allowing the user to choose between the Hessian returned by the optimization algorithm (default) or the Hessian estimated by the `hessian` function from the `numDeriv` package. The latter might be more accurate but its estimation is more time-consuming. We suggest to use the default Hessian estimation procedure during model building and estimate the `numDeriv`-based Hessian only on the final model. `print.level` this argument is only used if `fnoptim="nlm"`. It determines the level of printing during the optimisation process. The default value (for the `mexhaz` function) is set to '1' which means that details on the initial and final step of the optimisation procedure are printed (see the help page of `nlm` for further details). `...` represents additional parameters directly passed to `nlm` or `optim` to control the optimisation process.

Value

An object of class `mexhaz` containing the following elements:

 `dataset` name of the dataset used to fit the model. `call` function call on which the model is based. `formula` formula part of the call. `expected` name of the variable corresponding to the expected hazard (takes the value `NA` for standard, i.e., 'non-excess' hazard models). `xlevels` information concerning the levels of the categorical variables used in the model (used by `predict.mexhaz`). `n.obs.tot` total number of observations in the dataset. `n.obs` number of observations used to fit the model (after exclusion of missing values). `n.events` number of events (after exclusion of missing values). `n.clust` number of clusters. `n.time.0` number of observations for which the observed follow-up time was equal to 0 (only for right censored type data). `base` function used to model the baseline hazard. `max.time` maximal observed time in the dataset. `bounds` vector of boundary values used to define the B-spline (or natural spline) bases. `degree` degree of the B-spline used to model the logarithm of the baseline hazard. `knots` vector of interior knots used to define the B-spline (or natural spline) bases. `names.ph` names of the covariables with a proportional effect. `random` name of the variable defining cluster membership (set to `NA` in the case of a purely fixed effects model). `init` a vector containing the initial values of the parameters. `coefficients` a vector containing the parameter estimates. `std.errors` a vector containing the standard errors of the parameter estimates. `vcov` the variance-covariance matrix of the estimated parameters. `mu.hat` a `data.frame` containing the estimated cluster-specific random effects (shrinkage estimators). `var.mu.hat` the covariance matrix of the cluster-specific shrinkage estimators. `vcov.fix.mu.hat` a `matrix` containing the covariances between the fixed effect and the cluster-specific shrinkage estimators. More specifically, the i-th line of the matrix represents the covariances between the shrinkage estimator of the i-th cluster and the fixed effect estimates. This matrix is used by the function `predict.mexhaz` to make cluster-specific predictions. `n.par` number of estimated parameters. `n.gleg` number of Gauss-Legendre quadrature points used to calculate the cumulative (excess) hazard (only relevant if a B-spline of degree 2 or 3 or a cubic restricted spline was used to model the logarithm of the baseline hazard). `n.aghq` number of adaptive Gauss-Hermite quadrature points used to calculate the cluster-specific marginal likelihoods (only relevant if a multi-level model is fitted). `fnoptim` name of the R optimisation procedure used to maximise the likelihood. `method` optimisation method used by `optim`. `code` code (integer) indicating the status of the optimisation process (this code has a different meaning for `nlm` and for `optim`: see their respective help page for details). `loglik` value of the log-likelihood at the end of the optimisation procedure. `iter` number of iterations used in the optimisation process. `eval` number of evaluations used in the optimisation process. `time.elapsed` total time required to reach convergence.

References

Charvat H, Remontet L, Bossard N, Roche L, Dejardin O, Rachet B, Launoy G, Belot A; CENSUR Working Survival Group. A multilevel excess hazard model to estimate net survival on hierarchical data allowing for non-linear and non-proportional effects of covariates. Stat Med 2016;35:3066-3084 (doi: 10.1002/sim.6881)

`fixef.mexhaz`, `predict.mexhaz`, `print.mexhaz`, `ranef.mexhaz`, `summary.mexhaz`, `update.mexhaz`, `vcov.mexhaz`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30``` ```data(simdatn1) ## Fit of a mixed-effect excess hazard model, with the baseline hazard ## described by a Weibull distribution (without covariables) Mod_weib_mix <- mexhaz(formula=Surv(time=timesurv, event=vstat)~1, data=simdatn1, base="weibull", expected="popmrate", verbose=0, random="clust") ## More complex examples (not run) ## Fit of a mixed-effect excess hazard model, with the baseline hazard ## described by a cubic B-spline with two knots at 1 and 5 year and with ## effects of age (agecr), deprivation index (depindex) and sex (IsexH) # Mod_bs3_2mix_nph <- mexhaz(formula=Surv(time=timesurv, # event=vstat)~agecr+depindex+IsexH+nph(agecr), data=simdatn1, # base="exp.bs", degree=3, knots=c(1,5), expected="popmrate", # random="clust", verbose=1000) ## Fit of a mixed-effect excess hazard model, with the baseline hazard ## described by a restricted cubic spline with two knots at the ## tertiles of the distribution of follow-up times for individuals who ## presented the event and with effects of age (agecr) and sex (IsexH) # Mod_ns3_2mix_nph <- mexhaz(formula=Surv(time=timesurv, # event=vstat)~agecr+nph(agecr), data=simdatn1, base="exp.ns", degree=3, # knots=quantile(simdatn1[simdatn1\$vstat==1,]\$timesurv, probs=c(1:2/3)), # expected="popmrate", random="clust", verbose=1000) ```