Flexible parametric regression for timetoevent data
Description
Parametric modelling or regression for timetoevent data. Several builtin distributions are available, and users may supply their own.
Usage
1 2 3 
Arguments
formula 
A formula expression in conventional R linear modelling
syntax. The response must be a survival object as returned by the
If there are no covariates, specify By default, covariates are placed on the “location” parameter of
the distribution, typically the "scale" or "rate" parameter, through
a linear model, or a loglinear model if this parameter must be
positive. This gives an accelerated failure time model or a
proportional hazards model (see Covariates can be placed on other (“ancillary”) parameters by using
the name of the parameter as a “function” in the formula. For
example, in a Weibull model, the following expresses the scale
parameter in terms of age and a treatment variable
However, if the names of the ancillary parameters clash with any
real functions that might be used in formulae (such as
 
anc 
An alternative and safer way to model covariates on ancillary parameters, that is, parameters other than the main location parameter of the distribution. This is a named list of formulae, with the name of each component giving the parameter to be modelled. The model above can also be defined as:
 
data 
A data frame in which to find variables supplied in  
weights 
Optional variable giving case weights.  
bhazard 
Optional variable giving expected hazards for relative survival models.  
subset 
Vector of integers or logicals specifying the subset of the observations to be used in the fit.  
na.action 
a missingdata filter function, applied after any
'subset' argument has been used. Default is  
dist 
Typically, one of the strings in the first column of the following table, identifying a builtin distribution. This table also identifies the location parameters, and whether covariates on these parameters represent a proportional hazards (PH) or accelerated failure time (AFT) model. In an accelerated failure time model, the covariate speeds up or slows down the passage of time. So if the coefficient (presented on the log scale) is log(2), then doubling the covariate value would give half the expected survival time.
Alternatively, Very flexible splinebased distributions can also be fitted with
The parameterisations of the builtin distributions used here are
the same as in their builtin distribution functions:
For the Weibull, exponential and lognormal distributions,
The Weibull parameterisation is
different from that in Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates.  
inits 
An optional numeric vector giving initial values for each unknown parameter. These are numbered in the order: baseline parameters (in the order they appear in the distribution function, e.g. shape before scale in the Weibull), covariate effects on the location parameter, covariate effects on the remaining parameters. This is the same order as the printed estimates in the fitted model. If not specified, default initial values are chosen from a simple
summary of the survival or censoring times, for example the mean
is often used to initialize scale parameters. See the object
 
fixedpars 
Vector of indices of parameters whose values will be
fixed at their initial values during the optimisation. The indices
are ordered as in  
dfns 
An alternative way to define a custom survival distribution (see section
“Custom distributions” below).
A list whose components may include
If  
aux 
A named list of other arguments to pass to custom
distribution functions. This is used, for example, by
 
cl 
Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95.  
integ.opts 
List of named arguments to pass to
 
sr.control 
For the models which use  
... 
Optional arguments to the generalpurpose R
optimisation routine 
Details
Parameters are estimated by maximum likelihood using the
algorithms available in the standard R optim
function.
Parameters defined to be positive are estimated on the log scale.
Confidence intervals are estimated from the Hessian at the maximum,
and transformed back to the original scale of the parameters.
The usage of flexsurvreg
is intended to be similar to
survreg
in the survival package.
Value
A list of class "flexsurvreg"
containing information about the
fitted model. Components of interest to users may include:
call 
A copy of the function call, for use in postprocessing. 
dlist 
List defining the survival distribution used. 
res 
Matrix of maximum likelihood estimates and confidence limits, with parameters on their natural scales. 
res.t 
Matrix of maximum likelihood estimates and confidence
limits, with parameters all transformed to the real line. The 
loglik 
Loglikelihood 
logliki 
Vector of individual contributions to the loglikelihood 
AIC 
Akaike's information criterion (2*log likelihood + 2*number of estimated parameters) 
cov 
Covariance matrix of the parameters, on the realline scale
(e.g. log scale), which can be extracted with 
data 
Data used in the model fit. To extract this in the
standard R formats, use use

Custom distributions
flexsurvreg
is intended to be easy to extend to handle
new distributions. To define a new distribution for use in
flexsurvreg
, construct a list with the following
elements:
name
:A string naming the distribution. If this is called
"dist"
, for example, then there must be visible in the working environment, at least, eithera) a function called
ddist
which defines the probability density,or
b) a function called
hdist
which defines the hazard.Ideally, in case a) there should also be a function called
pdist
which defines the probability distribution or cumulative density, and in case b) there should be a function calledHdist
defining the cumulative hazard. If these additional functions are not provided, flexsurv attempts to automatically create them by numerically integrating the density or hazard function. However, model fitting will be much slower, or may not even work at all, if the analytic versions of these functions are not available.The functions must accept vector arguments (representing different times, or alternative values for each parameter) and return the results as a vector. The function
Vectorize
may be helpful for doing this: see the example below.These functions may be in an addon package (see below for an example) or may be userwritten. If they are userwritten they must be defined in the global environment, or supplied explicitly through the
dfns
argument toflexsurvreg
. The latter may be useful if the functions are created dynamically (as in the source offlexsurvspline
) and thus not visible through R's scoping rules.Arguments other than parameters must be named in the conventional way – for example
x
for the first argument of the density function or hazard, as indnorm(x, ...)
andq
for the first argument of the probability function. Density functions should also have an argumentlog
, after the parameters, which whenTRUE
, computes the log density, using a numerically stable additive formula if possible.Additional functions with names beginning with
"DLd"
and"DLS"
may be defined to calculate the derivatives of the log density and log survival probability, with respect to the parameters of the distribution. The parameters are expressed on the real line, for example after log transformation if they are defined as positive. The first argument must be namedt
, representing the time, and the remaining arguments must be named as the parameters of the density function. The function must return a matrix with rows corresponding to times, and columns corresponding to the parameters of the distribution. The derivatives are used, if available, to speed up the model fitting withoptim
.pars
:Vector of strings naming the parameters of the distribution. These must be the same names as the arguments of the density and probability functions.
location
:Name of the main parameter governing the mean of the distribution. This is the default parameter on which covariates are placed in the
formula
supplied toflexsurvreg
.transforms
:List of R functions which transform the range of values taken by each parameter onto the real line. For example,
c(log, log)
for a distribution with two positive parameters.inv.transforms
:List of R functions defining the corresponding inverse transformations. Note these must be lists, even for single parameter distributions they should be supplied as, e.g.
c(exp)
orlist(exp)
.inits
:A function of the observed survival times
t
(including rightcensoring times, and using the halfway point for intervalcensored times) which returns a vector of reasonable initial values for maximum likelihood estimation of each parameter. For example,function(t){ c(1, mean(t)) }
will always initialize the first of two parameters at 1, and the second (a scale parameter, for instance) at the mean oft
.
For example, suppose we want to use an extreme value survival
distribution. This is available in the CRAN package eha, which
provides conventionallydefined density and probability functions called
dEV
and pEV
. See the
Examples below for the custom list in this case, and the
subsequent command to fit the model.
Author(s)
Christopher Jackson <chris.jackson@mrcbsu.cam.ac.uk>
References
Jackson, C. (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 133. doi:10.18637/jss.v070.i08
Cox, C. (2008) The generalized F distribution: An umbrella for parametric survival analysis. Statistics in Medicine 27:43014312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:42524374
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
See Also
flexsurvspline
for flexible survival modelling using the
spline model of Royston and Parmar.
plot.flexsurvreg
and lines.flexsurvreg
to
plot fitted survival, hazards and cumulative hazards from models fitted
by flexsurvreg
and flexsurvspline
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37  data(ovarian)
## Compare generalized gamma fit with Weibull
fitg < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma")
fitg
fitw < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull")
fitw
plot(fitg)
lines(fitw, col="blue", lwd.ci=1, lty.ci=1)
## Identical AIC, probably not enough data in this simple example for a
## very flexible model to be worthwhile.
## Custom distribution
## make "dEV" and "pEV" from eha package (if installed)
## available to the working environment
if (require("eha")) {
custom.ev < list(name="EV",
pars=c("shape","scale"),
location="scale",
transforms=c(log, log),
inv.transforms=c(exp, exp),
inits=function(t){ c(1, median(t)) })
fitev < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian,
dist=custom.ev)
fitev
lines(fitev, col="purple", col.ci="purple")
}
## Custom distribution: supply the hazard function only
hexp2 < function(x, rate=1){ rate } # exponential distribution
hexp2 < Vectorize(hexp2)
custom.exp2 < list(name="exp2", pars=c("rate"), location="rate",
transforms=c(log), inv.transforms=c(exp),
inits=function(t)1/mean(t))
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2)
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp")
## should give same answer
