flexsurvreg  R Documentation 
Parametric modelling or regression for timetoevent data. Several builtin distributions are available, and users may supply their own.
flexsurvreg( formula, anc = NULL, data, weights, bhazard, rtrunc, subset, na.action, dist, inits, fixedpars = NULL, dfns = NULL, aux = NULL, cl = 0.95, integ.opts = NULL, sr.control = survreg.control(), hessian = TRUE, hess.control = NULL, ... )
formula 
A formula expression in conventional R linear modelling
syntax. The response must be a survival object as returned by the
If there are no covariates, specify If the right hand side is specified as By default, covariates are placed on the “location” parameter of the
distribution, typically the "scale" or "rate" parameter, through a linear
model, or a loglinear model if this parameter must be positive. This
gives an accelerated failure time model or a proportional hazards model
(see Covariates can be placed on other (“ancillary”) parameters by using the
name of the parameter as a “function” in the formula. For example, in a
Weibull model, the following expresses the scale parameter in terms of age
and a treatment variable
However, if the names of the ancillary parameters clash with any real
functions that might be used in formulae (such as
 
anc 
An alternative and safer way to model covariates on ancillary parameters, that is, parameters other than the main location parameter of the distribution. This is a named list of formulae, with the name of each component giving the parameter to be modelled. The model above can also be defined as:
 
data 
A data frame in which to find variables supplied in
 
weights 
Optional variable giving case weights.  
bhazard 
Optional variable giving expected hazards for relative survival models. The model is described by Nelson et al. (2007).
If For relative survival models, the loglikelihood returned by  
rtrunc 
Optional variable giving individualspecific righttruncation times. Used for analysing data with "retrospective ascertainment". For example, suppose we want to estimate the distribution of the time from onset of a disease to death, but have only observed cases known to have died by the current date. In this case, times from onset to death for individuals in the data are righttruncated by the current date minus the onset date. Predicted survival times for new cases can then be described by an untruncated version of the fitted distribution. These models can suffer from weakly identifiable parameters and
badlybehaved likelihood functions, and it is advised to compare
convergence for different initial values by supplying different
 
subset 
Vector of integers or logicals specifying the subset of the observations to be used in the fit.  
na.action 
a missingdata filter function, applied after any 'subset'
argument has been used. Default is  
dist 
Typically, one of the strings in the first column of the following table, identifying a builtin distribution. This table also identifies the location parameters, and whether covariates on these parameters represent a proportional hazards (PH) or accelerated failure time (AFT) model. In an accelerated failure time model, the covariate speeds up or slows down the passage of time. So if the coefficient (presented on the log scale) is log(2), then doubling the covariate value would give half the expected survival time.
Alternatively, Very flexible splinebased distributions can also be fitted with
The parameterisations of the builtin distributions used here are the same
as in their builtin distribution functions: A package vignette "Distributions reference" lists the survivor functions and covariate effect parameterisations used by each builtin distribution. For the Weibull, exponential and lognormal distributions,
The Weibull parameterisation is different from that in
Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates. The object  
inits 
An optional numeric vector giving initial values for each unknown parameter. These are numbered in the order: baseline parameters (in the order they appear in the distribution function, e.g. shape before scale in the Weibull), covariate effects on the location parameter, covariate effects on the remaining parameters. This is the same order as the printed estimates in the fitted model. If not specified, default initial values are chosen from a simple summary
of the survival or censoring times, for example the mean is often used to
initialize scale parameters. See the object  
fixedpars 
Vector of indices of parameters whose values will be fixed
at their initial values during the optimisation. The indices are ordered
as in  
dfns 
An alternative way to define a custom survival distribution (see
section “Custom distributions” below). A list whose components may
include
If  
aux 
A named list of other arguments to pass to custom distribution
functions. This is used, for example, by  
cl 
Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95.  
integ.opts 
List of named arguments to pass to
 
sr.control 
For the models which use  
hessian 
Calculate the covariances and confidence intervals for the
parameters. Defaults to  
hess.control 
List of options to control inversion of the Hessian to
obtain a covariance matrix. Available options are The Hessian is positive definite, thus invertible, at the maximum
likelihood. If the Hessian computed after optimisation convergence can't
be inverted, this is either because the converged result is not the
maximum likelihood (e.g. it could be a "saddle point"), or because the
numerical methods used to obtain the Hessian were inaccurate. If you
suspect that the Hessian was computed wrongly enough that it is not
invertible, but not wrongly enough that the nearest valid inverse would be
an inaccurate estimate of the covariance matrix, then these tolerance
values can be modified (reducing  
... 
Optional arguments to the generalpurpose optimisation routine

Parameters are estimated by maximum likelihood using the algorithms
available in the standard R optim
function. Parameters
defined to be positive are estimated on the log scale. Confidence intervals
are estimated from the Hessian at the maximum, and transformed back to the
original scale of the parameters.
The usage of flexsurvreg
is intended to be similar to
survreg
in the survival package.
A list of class "flexsurvreg"
containing information about
the fitted model. Components of interest to users may include:
call 
A copy of the function call, for use in postprocessing. 
dlist 
List defining the survival distribution used. 
res 
Matrix of maximum likelihood estimates and confidence limits, with parameters on their natural scales. 
res.t 
Matrix of maximum
likelihood estimates and confidence limits, with parameters all
transformed to the real line (using a log transform for all builtin
models where this is necessary). The

coefficients 
The transformed maximum likelihood
estimates, as in 
loglik 
Loglikelihood. This will differ from Stata, where the sum of the log uncensored survival times is added to the loglikelihood in survival models, to remove dependency on the time scale. For relative survival models specified with 
logliki 
Vector of individual contributions to the loglikelihood 
AIC 
Akaike's information criterion (2*log likelihood + 2*number of estimated parameters) 
cov 
Covariance matrix of the parameters, on
the realline scale (e.g. log scale), which can be extracted with

data 
Data used in the model fit. To extract
this in the standard R formats, use use

flexsurvreg
is intended to be
easy to extend to handle new distributions. To define a new distribution
for use in flexsurvreg
, construct a list with the following
elements:
A string naming the distribution. If this
is called "dist"
, for example, then there must be visible in the
working environment, at least, either
a) a function called ddist
which defines the probability density,
or
b) a function called hdist
which defines the hazard.
Ideally, in case a) there should also be a function called pdist
which defines the probability distribution or cumulative density, and in
case b) there should be a function called Hdist
defining the
cumulative hazard. If these additional functions are not provided,
flexsurv attempts to automatically create them by numerically
integrating the density or hazard function. However, model fitting will
be much slower, or may not even work at all, if the analytic versions of
these functions are not available.
The functions must accept vector arguments (representing different times,
or alternative values for each parameter) and return the results as a
vector. The function Vectorize
may be helpful for doing
this: see the example below.
These functions may be in an addon package (see below for an example) or
may be userwritten. If they are userwritten they must be defined in the
global environment, or supplied explicitly through the dfns
argument
to flexsurvreg
. The latter may be useful if the functions are
created dynamically (as in the source of flexsurvspline
) and thus
not visible through R's scoping rules.
Arguments other than parameters must be named in the conventional way –
for example x
for the first argument of the density function or
hazard, as in dnorm(x, ...)
and q
for the first
argument of the probability function. Density functions should also have
an argument log
, after the parameters, which when TRUE
,
computes the log density, using a numerically stable additive formula if
possible.
Additional functions with names beginning with "DLd"
and
"DLS"
may be defined to calculate the derivatives of the log density
and log survival probability, with respect to the parameters of the
distribution. The parameters are expressed on the real line, for example
after log transformation if they are defined as positive. The first
argument must be named t
, representing the time, and the remaining
arguments must be named as the parameters of the density function. The
function must return a matrix with rows corresponding to times, and columns
corresponding to the parameters of the distribution. The derivatives are
used, if available, to speed up the model fitting with optim
.
Vector of strings naming the parameters of the distribution. These must be the same names as the arguments of the density and probability functions.
Name of the main parameter governing the mean of
the distribution. This is the default parameter on which covariates are
placed in the formula
supplied to flexsurvreg
.
List of R
functions which transform the range of values taken by each parameter onto
the real line. For example, c(log, log)
for a distribution with two
positive parameters.
List of R functions defining the
corresponding inverse transformations. Note these must be lists, even for
single parameter distributions they should be supplied as, e.g.
c(exp)
or list(exp)
.
A function of the
observed survival times t
(including rightcensoring times, and
using the halfway point for intervalcensored times) which returns a vector
of reasonable initial values for maximum likelihood estimation of each
parameter. For example, function(t){ c(1, mean(t)) }
will always
initialize the first of two parameters at 1, and the second (a scale
parameter, for instance) at the mean of t
.
For example, suppose we want to use an extreme value survival distribution.
This is available in the CRAN package eha, which provides
conventionallydefined density and probability functions called
eha::dEV
and eha::pEV
. See the Examples below
for the custom list in this case, and the subsequent command to fit the
model.
Christopher Jackson <chris.jackson@mrcbsu.cam.ac.uk>
Jackson, C. (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 133. doi:10.18637/jss.v070.i08
Cox, C. (2008) The generalized F distribution: An umbrella for parametric survival analysis. Statistics in Medicine 27:43014312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:42524374
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
Nelson, C. P., Lambert, P. C., Squire, I. B., & Jones, D. R. (2007). Flexible parametric models for relative survival, with application in coronary heart disease. Statistics in medicine, 26(30), 54865498.
flexsurvspline
for flexible survival modelling using
the spline model of Royston and Parmar.
plot.flexsurvreg
and lines.flexsurvreg
to plot
fitted survival, hazards and cumulative hazards from models fitted by
flexsurvreg
and flexsurvspline
.
## Compare generalized gamma fit with Weibull fitg < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma") fitg fitw < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull") fitw plot(fitg) lines(fitw, col="blue", lwd.ci=1, lty.ci=1) ## Identical AIC, probably not enough data in this simple example for a ## very flexible model to be worthwhile. ## Custom distribution ## make "dEV" and "pEV" from eha package (if installed) ## available to the working environment if (require("eha")) { custom.ev < list(name="EV", pars=c("shape","scale"), location="scale", transforms=c(log, log), inv.transforms=c(exp, exp), inits=function(t){ c(1, median(t)) }) fitev < flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.ev) fitev lines(fitev, col="purple", col.ci="purple") } ## Custom distribution: supply the hazard function only hexp2 < function(x, rate=1){ rate } # exponential distribution hexp2 < Vectorize(hexp2) custom.exp2 < list(name="exp2", pars=c("rate"), location="rate", transforms=c(log), inv.transforms=c(exp), inits=function(t)1/mean(t)) flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2) flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp") ## should give same answer
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.