Parametric modelling or regression for timetoevent data. Several builtin distributions are available, and users may supply their own.
1 2 3 
formula 
A formula expression in conventional R linear modelling
syntax. The response must be a survival object as returned by the
If there are no covariates, specify By default, covariates are placed on the “location” parameter of
the distribution, typically the "scale" or "rate" parameter, through
a linear model, or a loglinear model if this parameter must be
positive. This gives an accelerated failure time model or a
proportional hazards model (see Covariates can be placed on other (“ancillary”) parameters by using
the name of the parameter as a “function” in the formula. For
example, in a Weibull model, the following expresses the scale
parameter in terms of age and a treatment variable
However, if the names of the ancillary parameters clash with any
real functions that might be used in formulae (such as
 
anc 
An alternative and safer way to model covariates on ancillary parameters, that is, parameters other than the main location parameter of the distribution. This is a named list of formulae, with the name of each component giving the parameter to be modelled. The model above can also be defined as:
 
data 
A data frame in which to find variables supplied in  
weights 
Optional variable giving case weights.  
bhazard 
Optional variable giving expected hazards for relative survival models.  
subset 
Vector of integers or logicals specifying the subset of the observations to be used in the fit.  
na.action 
a missingdata filter function, applied after any
'subset' argument has been used. Default is  
dist 
Typically, one of the strings in the first column of the following table, identifying a builtin distribution. This table also identifies the location parameters, and whether covariates on these parameters represent a proportional hazards (PH) or accelerated failure time (AFT) model. In an accelerated failure time model, the covariate speeds up or slows down the passage of time. So if the coefficient (presented on the log scale) is log(2), then doubling the covariate value would give half the expected survival time.
Alternatively, Very flexible splinebased distributions can also be fitted with
The parameterisations of the builtin distributions used here are
the same as in their builtin distribution functions:
For the Weibull, exponential and lognormal distributions,
The Weibull parameterisation is
different from that in Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates.  
inits 
An optional numeric vector giving initial values for each unknown parameter. These are numbered in the order: baseline parameters (in the order they appear in the distribution function, e.g. shape before scale in the Weibull), covariate effects on the location parameter, covariate effects on the remaining parameters. This is the same order as the printed estimates in the fitted model. If not specified, default initial values are chosen from a simple
summary of the survival or censoring times, for example the mean
is often used to initialize scale parameters. See the object
 
fixedpars 
Vector of indices of parameters whose values will be
fixed at their initial values during the optimisation. The indices
are ordered as in  
dfns 
An alternative way to define a custom survival distribution (see section
“Custom distributions” below).
A list whose components may include
If  
aux 
A named list of other arguments to pass to custom
distribution functions. This is used, for example, by
 
cl 
Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95.  
integ.opts 
List of named arguments to pass to
 
sr.control 
For the models which use  
... 
Optional arguments to the generalpurpose R
optimisation routine 
Parameters are estimated by maximum likelihood using the
algorithms available in the standard R optim
function.
Parameters defined to be positive are estimated on the log scale.
Confidence intervals are estimated from the Hessian at the maximum,
and transformed back to the original scale of the parameters.
The usage of flexsurvreg
is intended to be similar to
survreg
in the survival package.
A list of class "flexsurvreg"
containing information about the
fitted model. Components of interest to users may include:
call 
A copy of the function call, for use in postprocessing. 
dlist 
List defining the survival distribution used. 
res 
Matrix of maximum likelihood estimates and confidence limits, with parameters on their natural scales. 
res.t 
Matrix of maximum likelihood estimates and confidence
limits, with parameters all transformed to the real line. The 
loglik 
Loglikelihood 
logliki 
Vector of individual contributions to the loglikelihood 
AIC 
Akaike's information criterion (2*log likelihood + 2*number of estimated parameters) 
cov 
Covariance matrix of the parameters, on the realline scale
(e.g. log scale), which can be extracted with 
data 
Data used in the model fit. To extract this in the
standard R formats, use use

flexsurvreg
is intended to be easy to extend to handle
new distributions. To define a new distribution for use in
flexsurvreg
, construct a list with the following
elements:
name
:A string naming the distribution.
If this is called "dist"
, for example, then there must be
visible in the working environment, at least, either
a) a function called ddist
which defines the probability
density,
or
b) a function called hdist
which defines the hazard.
Ideally, in case a) there should also be a function called
pdist
which defines the probability distribution or
cumulative density, and in case b) there should be a function
called Hdist
defining the cumulative hazard. If these
additional functions are not provided, flexsurv attempts to
automatically create them by numerically integrating the density
or hazard function. However, model fitting will be much slower,
or may not even work at all, if the analytic versions of these
functions are not available.
The functions must accept vector arguments (representing different
times, or alternative values for each parameter) and return the results
as a vector. The function Vectorize
may be helpful
for doing this: see the example below.
These functions may be in an addon package (see below for an
example) or may be userwritten. If they are userwritten they
must be defined in the global environment, or supplied explicitly
through the dfns
argument to flexsurvreg
.
The latter may be useful if the functions are created dynamically
(as in the source of flexsurvspline
) and thus not visible
through R's scoping rules.
Arguments other than parameters must be named in the conventional
way – for example x
for the first argument of the density
function or hazard, as in dnorm(x, ...)
and q
for the first argument of the probability function. Density
functions should also have an argument log
, after the
parameters, which when TRUE
, computes the log density,
using a numerically stable additive formula if possible.
Additional functions with names beginning with "DLd"
and "DLS"
may be
defined to calculate the derivatives of the
log density and log survival probability, with respect to the
parameters of the distribution. The parameters are expressed on
the real line, for example after log transformation if they are
defined as positive. The first argument must be named
t
, representing the time, and the remaining arguments must be
named as the parameters of the density function. The function must return a
matrix with rows corresponding to times, and columns
corresponding to the parameters of the distribution. The derivatives are used, if
available, to speed up the model fitting with optim
.
pars
:Vector of strings naming the parameters of the distribution. These must be the same names as the arguments of the density and probability functions.
location
:Name of the main parameter governing the mean of
the distribution. This is the default parameter on which
covariates are placed in the formula
supplied to
flexsurvreg
.
transforms
:List of R functions which transform the range of
values taken by each parameter onto the real line. For example,
c(log, log)
for a distribution with two positive parameters.
inv.transforms
:List of R functions defining the corresponding
inverse transformations. Note these must be lists, even for
single parameter distributions they should be supplied as,
e.g. c(exp)
or list(exp)
.
inits
:A function of the observed survival times t
(including rightcensoring times, and using the halfway point for
intervalcensored times)
which returns a vector of reasonable initial values for maximum
likelihood estimation of each parameter. For example,
function(t){ c(1, mean(t)) }
will always initialize the first
of two parameters at 1, and the second (a scale
parameter, for instance) at the mean of t
.
For example, suppose we want to use an extreme value survival
distribution. This is available in the CRAN package eha, which
provides conventionallydefined density and probability functions called
dEV
and pEV
. See the
Examples below for the custom list in this case, and the
subsequent command to fit the model.
Christopher Jackson <chris.jackson@mrcbsu.cam.ac.uk>
Jackson, C. (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 133. doi:10.18637/jss.v070.i08
Cox, C. (2008) The generalized F distribution: An umbrella for parametric survival analysis. Statistics in Medicine 27:43014312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:42524374
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
flexsurvspline
for flexible survival modelling using the
spline model of Royston and Parmar.
plot.flexsurvreg
and lines.flexsurvreg
to
plot fitted survival, hazards and cumulative hazards from models fitted
by flexsurvreg
and flexsurvspline
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37  data(ovarian)
## Compare generalized gamma fit with Weibull
fitg < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma")
fitg
fitw < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull")
fitw
plot(fitg)
lines(fitw, col="blue", lwd.ci=1, lty.ci=1)
## Identical AIC, probably not enough data in this simple example for a
## very flexible model to be worthwhile.
## Custom distribution
## make "dEV" and "pEV" from eha package (if installed)
## available to the working environment
if (require("eha")) {
custom.ev < list(name="EV",
pars=c("shape","scale"),
location="scale",
transforms=c(log, log),
inv.transforms=c(exp, exp),
inits=function(t){ c(1, median(t)) })
fitev < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian,
dist=custom.ev)
fitev
lines(fitev, col="purple", col.ci="purple")
}
## Custom distribution: supply the hazard function only
hexp2 < function(x, rate=1){ rate } # exponential distribution
hexp2 < Vectorize(hexp2)
custom.exp2 < list(name="exp2", pars=c("rate"), location="rate",
transforms=c(log), inv.transforms=c(exp),
inits=function(t)1/mean(t))
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2)
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp")
## should give same answer

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.