Flexible parametric regression for time-to-event data

Share:

Description

Parametric regression for time-to-event data using the generalized F and other flexible distributions. Users may easily extend this function with their own survival distributions.

Usage

1
2
flexsurvreg(formula, data, weights, subset, na.action, dist, inits,
            fixedpars=NULL, cl=0.95,...)

Arguments

formula

A formula expression in conventional R linear modelling syntax. The response must be a survival object as returned by the Surv function, and any covariates are given on the right-hand side. For example,

Surv(time, dead) ~ age + sex

Only Surv objects of type="right" or type="counting", corresponding to right-censored and/or left-truncated observations, are supported.

If there are no covariates, specify 1 on the right hand side, for example Surv(time, dead) ~ 1.

By default, covariates are placed on the “location” parameter of the distribution, typically the "scale" or "rate" parameter, through a linear model, or a log-linear model if this parameter must be positive. This gives an accelerated failure time model or a proportional hazards model, depending on the distribution.

Covariates can be placed on other parameters by using the name of the parameter as a function in the formula. For example, in a Weibull model, the following expresses the scale parameter in terms of age and a treatment variable treat, and the shape parameter in terms of sex and treatment.

Surv(time, dead) ~ age + treat + shape(sex) + shape(treatment)

data

A data frame in which to find variables supplied in formula. If not given, the variables should be in the working environment.

weights

Optional vector of case weights.

subset

Vector of integer or logicals specifying the subset of the observations to be used in the fit.

na.action

a missing-data filter function, applied after any 'subset' argument has been used. Default is options()$na.action.

dist

Either one of the following strings identifying a built-in distribution:

"gengamma" Generalized gamma (stable parameterisation)
"gengamma.orig" Generalized gamma (original parameterisation)
"genf" Generalized F (stable parameterisation)
"genf.orig" Generalized F (original parameterisation)
"weibull" Weibull
"gamma" Gamma
"exp" Exponential
"lnorm" Log-normal
"gompertz" Gompertz

or a list specifying a custom distribution. See section “Custom distributions” below for how to construct this list.

The parameterisations of the built-in distributions used here are the same as in their built-in distribution functions: dgengamma, dgengamma.orig, dgenf, dgenf.orig, dweibull, dgamma, dexp, dlnorm, dgompertz, respectively. The functions in base R are used where available, otherwise, they are provided in this package.

Note that the Weibull parameterisation is different from that in survreg, instead it is consistent with dweibull. The "scale" reported by survreg is equivalent to 1/shape as defined by dweibull and hence flexsurvreg. The first coefficient (Intercept) reported by survreg is equivalent to log(scale) in dweibull and flexsurvreg.

Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates.

inits

A numeric vector giving initial values for each unknown parameter. If not specified, default initial values are chosen from a simple summary of the uncensored survival time, for example the mean is often used to initialize scale parameters. See the object flexsurv.dists in the source for the exact methods used. If the likelihood surface may be uneven, it is advised to run the optimisation starting from various different initial values to ensure convergence to the true global maximum.

fixedpars

Vector of indices of parameters whose values will be fixed at their initial values during the optimisation. The indices are ordered with parameters of the baseline distribution coming first, followed by covariate effects. For example, in a stable generalized Gamma model with two covariates, to fix the third of three generalized gamma parameters (the shape Q, see the help for GenGamma) and the second covariate, specify fixedpars = c(3, 5)

cl

Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95.

...

Optional arguments to the general-purpose R optimisation routine optim. For example, the BFGS optimisation algorithm is the default in flexsurvreg, but this can be changed, for example to method="Nelder-Mead" which can be more robust to poor initial values. If the optimisation fails to converge, consider normalising the problem using, for example, control=list(fnscale = 2500), for example, replacing 2500 by a number of the order of magnitude of the likelihood. If 'false' convergence is reported with a non-positive-definite Hessian, then consider tightening the tolerance criteria for convergence. If the optimisation takes a long time, intermediate steps can be printed using the trace argument of the control list. See optim for details.

Details

Parameters are estimated by maximum likelihood using the algorithms available in the standard R optim function. Parameters defined to be positive are estimated on the log scale. Confidence intervals are estimated from the Hessian at the maximum, and transformed back to the original scale of the parameters.

The usage of flexsurvreg is intended to be as similar as possible to survreg in the survival package.

Value

A list of class "flexsurvreg" with the following elements.

call

A copy of the function call, for use in post-processing.

dlist

List defining the survival distribution used.

res

Matrix of maximum likelihood estimates and confidence limits, with parameters on their natural scales.

res.t

Matrix of maximum likelihood estimates and confidence limits, with parameters all transformed to the real line. The coef, vcov and confint methods for flexsurvreg objects work on this scale.

loglik

Log-likelihood

AIC

Akaike's information criterion (-2*log likelihood + 2*number of estimated parameters)

Custom distributions

flexsurvreg is intended to be easy to extend to handle new distributions. To define a new distribution for use in flexsurvreg, construct a list with the following elements:

name:

A string naming the distribution. If this is called "dist", for example, then there must be a function called ddist in the working environment which defines the probability density, and a function called pdist which defines the probability distribution or cumulative density. These functions may be in an add-on package (see below for an example) or may be user-written. Arguments other than parameters must be named in the conventional way – for example x for the first argument of the density function, as in dnorm(x, ...) and q for the first argument of the probability function.

pars:

Vector of strings naming the parameters of the distribution. These must be the same names as the arguments of the density and probability functions.

location:

Name of the parameter which can be modelled as a linear function of covariates, possibly after transformation.

transforms:

Vector of R functions which transform the range of values taken by each parameter onto the real line. For example, c(log, log) for a distribution with two positive parameters.

inv.transforms:

Vector of R functions defining the corresponding inverse transformations.

inits:

A function of the uncensored survival times t, which returns a vector of reasonable initial values for maximum likelihood estimation of each parameter. For example, function(t){ c(1, mean(t)) } will always initialize the first of two parameters at 1, and the second (a scale parameter, for instance) at the mean of t.

For example, suppose we want to use a log-logistic survival distribution. This is available in the CRAN package eha, which provides conventionally-defined density and probability functions called dllogis and pllogis. See the Examples below for the custom list in this case, and the subsequent command to fit the model.

Author(s)

Christopher Jackson <chris.jackson@mrc-bsu.cam.ac.uk>

References

Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.

Cox, C. (2008) The generalized F distribution: An umbrella for parametric survival analysis. Statistics in Medicine 27:4301-4312.

Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:4252-4374

See Also

flexsurvspline for flexible survival modelling using the spline model of Royston and Parmar.

plot.flexsurvreg and lines.flexsurvreg to plot fitted survival, hazards and cumulative hazards from models fitted by flexsurvreg and flexsurvspline.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(ovarian)
## Compare generalized gamma fit with Weibull
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma")
fitg
fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull")
fitw
plot(fitg)
lines(fitw, col="blue", lwd.ci=1, lty.ci=1)
## Identical AIC, probably not enough data in this simple example for a
## very flexible model to be worthwhile.

## Custom distribution
library(eha)  ## make "dllogis" and "pllogis" available to the working environment
custom.llogis <- list(name="llogis",
                      pars=c("shape","scale"),
                      location="scale",
                      transforms=c(log, log),
                      inv.transforms=c(exp, exp),
                      inits=function(t){ c(1, median(t)) })
fitl <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.llogis)
fitl
lines(fitl, col.fit="purple", col.ci="purple")