mexhaz: mexhaz function

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/mexhaz.R


Fit an (excess) hazard regression model using different shapes for the baseline hazard (Weibull, piecewise constant, exponential of a B-spline of degree 1 to 3, exponential of a restricted cubic spline), with the possibility to include time-dependent effects of variable(s) and a random effect defined at the cluster level. The function accepts right-censored and counting process input styles for the follow-up time. The latter allows the modelling of survival data with delayed entries. The time-dependent effect of a covariable is modelled by adding interaction terms between the covariable and a function of time of the same class as the one used for the baseline hazard (in particular, with the same knots for piecewise constant hazards; and with the same degree and the same knots for B-spline or restricted cubic spline functions). The random effect is assumed to be normally distributed with mean 0 and standard deviation sigma. The optimisation process uses adaptive Gaussian quadrature to calculate the cluster-specific marginal likelihoods. The logarithm of the full marginal likelihood, defined as the sum of the logarithms of the cluster-specific marginal likelihoods, is then maximised using an optimisation routine such as nlm (default) or optim.


mexhaz(formula, data, expected = NULL, base = c("weibull",
"", "exp.ns", "pw.cst"), degree = 3, knots = NULL, bound = NULL,
n.gleg = 20, init = NULL, random = NULL, n.aghq = 10,
fnoptim = c("nlm", "optim"), verbose = 100, method = "Nelder-Mead",
iterlim=10000, numHess=FALSE, print.level=1,...)



a formula object, with the response on the left of the ~ operator, and the linear predictor on the right. The response must be of the form Surv(time, event) for right censored data or Surv(time, time2, event) for counting process style data. The linear predictor accepts a special instruction nph() for specifying variables for which a time-dependent effect should be modelled (if several variables are modelled with time-dependent effects, separate these variables inside the nph() instruction with a + sign).

In case time takes the value 0 for some observations when data are entered in the right censored style, it is assumed that these observations refer to events/censoring that occurred on the first day of follow-up. Consequently, a value of 1/730.5 (half a day) is substituted in order to make computations possible. However, it should be stressed that this is just a convention and that it does not make much sense if the time scale is not expressed in years. We therefore advise the analyst to deal with 0 time values during the dataset preparation stage.


a data.frame containing the variables referred to in the formula, as well as in the expected and random arguments if these arguments are used.


name of the variable (must be given in quotes) representing the population (i.e., expected) hazard. By default, expected=NULL, which means that the function estimates the overall hazard (and not the excess hazard).


functional form that should be used to model the baseline hazard. Selection can be made between the following options: "weibull" for a Weibull hazard, "" for a hazard described by the exponential of a B-spline (only B-splines of degree 1, 2 or 3 are accepted), "exp.ns" for a hazard described by the exponential of a restricted cubic spline (also called 'natural spline'), "pw.cst" for a piecewise constant hazard. By default, base="weibull".

For the Weibull hazard model, the cumulative hazard is given by the following relationship:

H(t,x,z) = lambda*exp(x'b)*t^(rho*exp(z'g))

where lambda and rho are the parameters of the Weibull baseline hazard, x represent variables with proportional effect (with corresponding regression coefficients 'b') and z represent variables with time-dependent effects (with corresponding regression coefficients 'g'). The mexhaz() function does not estimate directly lambda and rho but their logarithms (in the output of the function, these are named respectively 'logLambda' and 'logRho').

For the spline-based hazards, it should be noted that the B-spline and restricted cubic spline bases created internally in the mexhaz() function are identical to those obtained by the use of, respectively, the bs() and ns() functions from the splines package.


if base="", degree represents the degree of the B-spline used. Only integer values between 1 and 3 are accepted, and 3 is the default.


if base="" or "exp.ns", knots is the vector of interior knots of the spline. If base="pw.cst", knots is the vector defining the endpoints of the time intervals on which the hazard is assumed to be constant. By default, knots=NULL (that is, it produces a B-spline with no interior knots if base="", a linear B-spline with no interior knots if base="exp.ns", or a constant hazard over the whole follow-up period if base="pw.cst").


a vector of two numerical values corresponding to the boundary knots of the spline functions. If base="" or base="exp.ns", computation of the B-spline basis requires that boundary knots be given. The bound argument allows the user to specify these boundary knots. If base="", the interval defined by the boundary knots must at least include the interval c(0,max(time)) (otherwise, there could be problems with ill-conditioned bases). If base="exp.ns", the boundary knots correspond to the knots beyond which the spline is contrained to be linear (in that case, the boundary knots can be values contained in c(0,max(time))). By default, the boundary knots are set to c(0,max(time)).


if base="" and degree is equal to 2 or 3, or if base="exp.ns", the cumulative hazard is computed via Gauss-Legendre quadrature and n.gleg is the number of quadrature nodes to be used to compute the cumulative hazard. By default, n.gleg=20.


vector of initial values. By default init=NULL and the initial values are internally set to the following values:

for the baseline hazard:

if base="weibull", the scale and shape parameters are set to log(0.1);

if base="", the parameters of the B-spline are all set to -1;

if base="exp.ns", the parameters of the restricted cubic spline are all set to -1;

if base="pw.cst", the logarithm of the piecewise-constant hazards are set to -1;

the parameters describing the effects of the covariables are all set to 0;

the parameter representing the standard deviation of the random effect is set to 0.1.


name of the variable to be entered as a random effect (must be given between quotes), representing the cluster membership. By default, random=NULL which means that the function fits a fixed effects model.


number of quadrature points to be used for estimating the cluster-specific marginal likelihoods by adaptive Gauss-Hermite quadrature. By default, n.aghq=10.


name of the R optimisation procedure used to maximise the likelihood. Selection can be made between "nlm" (by default) and "optim".


integer parameter representing the frequency at which the current state of the optimisation process is displayed. Internally, an 'evaluation' is defined as an estimation of the log-likelihood for a given vector of parameters. This means that the number of evaluations is increased each time the optimisation procedure updates the value of any of the parameters to be estimated. If verbose=n (with n an integer), the function will display the current values of the parameters, the log-likelihood and the time elapsed every n evaluations. If verbose=0, nothing is displayed.


if fnoptim="optim", method represents the optimisation method to be used by optim. By default, method="Nelder-Mead". This parameter is not used if fnoptim="nlm".


if fnoptim="nlm", iterlim represents the maximum number of iterations before the nlm optimisation procedure is terminated. By default, iterlim is set to 10000. This parameter is not used if fnoptim="optim" (in this case, the maximum number of iterations must be given as part of a list of control parameters via the control argument: see the help page of optim for further details).


logical value allowing the user to choose between the Hessian returned by the optimization algorithm (default) or the Hessian estimated by the hessian function from the numDeriv package. The latter might be more accurate but its estimation is more time-consuming. We suggest to use the default Hessian estimation procedure during model building and estimate the numDeriv-based Hessian only on the final model.


this argument is only used if fnoptim="nlm". It determines the level of printing during the optimisation process. The default value (for the mexhaz function) is set to '1' which means that details on the initial and final step of the optimisation procedure are printed (see the help page of nlm for further details).


represents additional parameters directly passed to nlm or optim to control the optimisation process.


An object of class mexhaz containing the following elements:


name of the dataset used to fit the model.


function call on which the model is based.


formula part of the call.


name of the variable corresponding to the expected hazard (takes the value NA for standard, i.e., 'non-excess' hazard models).


information concerning the levels of the categorical variables used in the model (used by predict.mexhaz).


total number of observations in the dataset.


number of observations used to fit the model (after exclusion of missing values).

number of events (after exclusion of missing values).


number of clusters.


number of observations for which the observed follow-up time was equal to 0 (only for right censored type data).


function used to model the baseline hazard.


maximal observed time in the dataset.


vector of boundary values used to define the B-spline (or natural spline) bases.


degree of the B-spline used to model the logarithm of the baseline hazard.


vector of interior knots used to define the B-spline (or natural spline) bases.

names of the covariables with a proportional effect.


name of the variable defining cluster membership (set to NA in the case of a purely fixed effects model).


a vector containing the initial values of the parameters.


a vector containing the parameter estimates.


a vector containing the standard errors of the parameter estimates.


the variance-covariance matrix of the estimated parameters.


a data.frame containing the estimated cluster-specific random effects (shrinkage estimators).

the covariance matrix of the cluster-specific shrinkage estimators.

a matrix containing the covariances between the fixed effect and the cluster-specific shrinkage estimators. More specifically, the i-th line of the matrix represents the covariances between the shrinkage estimator of the i-th cluster and the fixed effect estimates. This matrix is used by the function predict.mexhaz to make cluster-specific predictions.


number of estimated parameters.


number of Gauss-Legendre quadrature points used to calculate the cumulative (excess) hazard (only relevant if a B-spline of degree 2 or 3 or a cubic restricted spline was used to model the logarithm of the baseline hazard).


number of adaptive Gauss-Hermite quadrature points used to calculate the cluster-specific marginal likelihoods (only relevant if a multi-level model is fitted).


name of the R optimisation procedure used to maximise the likelihood.


optimisation method used by optim.


code (integer) indicating the status of the optimisation process (this code has a different meaning for nlm and for optim: see their respective help page for details).


value of the log-likelihood at the end of the optimisation procedure.


number of iterations used in the optimisation process.


number of evaluations used in the optimisation process.


total time required to reach convergence.


Hadrien Charvat, Aurelien Belot


Charvat H, Remontet L, Bossard N, Roche L, Dejardin O, Rachet B, Launoy G, Belot A; CENSUR Working Survival Group. A multilevel excess hazard model to estimate net survival on hierarchical data allowing for non-linear and non-proportional effects of covariates. Stat Med 2016;35:3066-3084 (doi: 10.1002/sim.6881)

See Also

fixef.mexhaz, predict.mexhaz, print.mexhaz, ranef.mexhaz, summary.mexhaz, update.mexhaz, vcov.mexhaz



## Fit of a mixed-effect excess hazard model, with the baseline hazard
## described by a Weibull distribution (without covariables)

Mod_weib_mix <- mexhaz(formula=Surv(time=timesurv,
event=vstat)~1, data=simdatn1, base="weibull",
expected="popmrate", verbose=0, random="clust")

## More complex examples (not run)

## Fit of a mixed-effect excess hazard model, with the baseline hazard
## described by a cubic B-spline with two knots at 1 and 5 year and with
## effects of age (agecr), deprivation index (depindex) and sex (IsexH)

# Mod_bs3_2mix_nph <- mexhaz(formula=Surv(time=timesurv,
# event=vstat)~agecr+depindex+IsexH+nph(agecr), data=simdatn1,
# base="", degree=3, knots=c(1,5), expected="popmrate",
# random="clust", verbose=1000)

## Fit of a mixed-effect excess hazard model, with the baseline hazard
## described by a restricted cubic spline with two knots at the
## tertiles of the distribution of follow-up times for individuals who
## presented the event and with effects of age (agecr) and sex (IsexH)

# Mod_ns3_2mix_nph <- mexhaz(formula=Surv(time=timesurv,
# event=vstat)~agecr+nph(agecr), data=simdatn1, base="exp.ns", degree=3,
# knots=quantile(simdatn1[simdatn1$vstat==1,]$timesurv, probs=c(1:2/3)),
# expected="popmrate", random="clust", verbose=1000)

mexhaz documentation built on April 16, 2018, 1:04 a.m.