Penalised generalised survival model

Description

This implements the generalised survival model g(S(t|x)) = eta, where g is a link function, S is survival, t is time, x are covariates and eta is a linear predictor. The linear predictor can include penalised smoothers for the time effects, for time:covariate interactions and for covariate effects using the mgcv smoothers. The main model assumption is that the time effects in the linear predictor are smooth. This extends the class of flexible parametric survival models developed by Royston and colleagues. The model has been extended to include relative survival, Gamma frailties and normal random effects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
pstpm2(formula, data, smooth.formula = NULL, smooth.args = NULL,
       logH.args = NULL, 
       tvc = NULL, 
       control = list(parscale = 1, maxit = 300), init = NULL,
       coxph.strata = NULL, coxph.formula = NULL,
       weights = NULL, robust = FALSE, 
       bhazard = NULL, timeVar = "", time0Var = "",
       sp=NULL, use.gr = TRUE, 
       criterion=c("GCV","BIC"), penalty = c("logH","h"),
       smoother.parameters = NULL,
       alpha=if (is.null(sp)) switch(criterion,GCV=1,BIC=1) else 1,
       sp.init=1, trace = 0,
       link.type=c("PH","PO","probit","AH","AO"), theta.AO=0,
       optimiser = c("BFGS", "NelderMead", "Nlm"),
       recurrent = FALSE, frailty=!is.null(cluster) & !robust,cluster = NULL,
       logtheta=-6, nodes=9,
       RandDist=c("Gamma","LogN"), adaptive = TRUE, maxkappa=1000, Z = ~1, 
       reltol = list(search = 1.0e-10, final = 1.0e-10, outer=1.0e-5),outer_optim=1,
       contrasts = NULL, subset = NULL, ...)

Arguments

formula

a formula object, with the response on the left of a ~ operator, and the parametric terms on the right. The response must be a survival object as returned by the Surv function. [required]

data

a data.frame in which to interpret the variables named in the formula argument.

smooth.formula

a mgcv::gam formula for describing the time effects and time-dependent effects and smoothed covariate effects on the linear predictor scale (default=NULL). The default model is equal to ~s(log(time),k=-1) where time is the time variable.

smooth.args

a list describing the arguments for the s function for modelling the baseline time effect on the linear predictor scale (default=NULL).

logH.args

as per smooth.args. Deprecated.

tvc

a list with the names of the time-varying coefficients (e.g. tvc=list(hormon), which is equivalent to smooth.formula=~...+s(log(time),by=hormon)).

control

control argument passed to optim.

init

init should either be FALSE, such that initial values will be determined using Cox regression, or a numeric vector of initial values.

coxph.strata

variable in the data argument for stratification of the coxph model fit for estimating initial values.

weights

an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector.

robust

Boolean used to determine whether to use a robust variance estimator.

bhazard

variable for the baseline hazard for relative survival

timeVar

variable defining the time variable. By default, this is determined from the survival object, however this may be ambiguous if two variables define the time

sp

fix the value of the smoothing parameters.

use.gr

in R, a Boolean to determine whether to use the gradient in the optimisation

criterion

in Rcpp, determine whether to use "GCV" or "BIC" for for the smoothing parameter selection.

penalty

use either the "logH" penalty, which is the default penalty from mgcv, or the "h" hazard penalty.

smoother.parameters

for the hazard penalty, a list with components which are lists with components var, transform and inverse.

alpha

an ad hoc tuning parameter for the smoothing parameter.

sp.init

initial values for the smoothing parameters.

trace

integer for trace reporting; 0 represents no additional reporting.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

coxph.formula

additional formula used to improve the fitting of initial values [optional and rarely used].

time0Var

string variable to determine the entry variable; useful for when more than one data variable is used in the entry time.

link.type

type of link function. For "PH" (generalised proportional hazards), g(S)=log(-log(S)); for "PO" (generalised proportional odds), g(S)=-logit(S); for "probit" (generalised probit), g(S)=-probit(S); for "AH" (generalised additive hazards), g(S)=-log(S); for "AO" (generalised Aranda-Ordaz), g(S)=log((S^(-theta.AO)-1)/theta.AO).

theta.AO

theta parameter for the Aranda-Ordaz link type.

optimiser

select which optimiser is used

recurrent

logical for whether clustered, left truncated data are recurrent or for first event (where the latter requires an adjustment for the frailties or random effects)

frailty

logical for whether to fit a shared frailty model

cluster

string for the data variable that determines the cluster for the frailty

logtheta

initial value for log-theta used in the gamma shared frailty model

nodes

number of integration points for Gaussian quadrature

RandDist

type of distribution for the random effect or frailty

adaptive

logical for whether to use adaptive or non-adaptive quadrature

maxkappa

double float value for the maximum value of the weight used in the constraint

Z

formula for the design matrix for the random effects

reltol

list with components for search and final relative tolerances.

outer_optim

Integer to indicate the algorithm for outer optimisation. If outer_optim=1, then use Neldear-Mead, otherwise use Nlm.

...

additional arguments to be passed to the mle2 .

Details

The implementation extends the mle2 object from the bbmle package.

The default smoother for time on the linear predictor scale is s(log(time)).

Value

A pstpm2-class object.

Author(s)

Mark Clements, Xing-Rong Liu.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
## Not run: 
data(brcancer)
## standard Kaplan-Meier curves by hormon
plot(survfit(Surv(rectime/365,censrec==1)~1,data=brcancer,subset=hormon==1),
  xlab="Recurrence free survival time (years)",
  ylab="Survival")
lines(survfit(Surv(rectime/365,censrec==1)~1,data=brcancer,subset=hormon==0),col=2,
  conf.int=TRUE)
legend("topright", legend=c("Hormonal therapy","No hormonal therapy"),lty=1,col=1:2,bty="n")

## now fit a penalised stpm2 model
fit <- pstpm2(Surv(rectime/365,censrec==1)~hormon,data=brcancer)
## no S4 generic lines() method: instead, use plot(..., add=TRUE)
plot(fit,newdata=data.frame(hormon=1),type="surv",add=TRUE,ci=FALSE,line.col="blue",lwd=2,
  rug=FALSE)
plot(fit,newdata=data.frame(hormon=0),type="surv",add=TRUE,ci=FALSE,line.col="green",lwd=2,
  rug=FALSE)

## plot showing proportional hazards
plot(fit,newdata=data.frame(hormon=1),type="hazard",line.col="blue",lwd=2,
  rug=FALSE,ylim=c(0,1e-3))
plot(fit,newdata=data.frame(hormon=0),type="hazard",add=TRUE,ci=FALSE,line.col="green",lwd=2,
  rug=FALSE)

## time-varying hazard ratios
fit.tvc <- pstpm2(Surv(rectime,censrec==1)~1,
  data=brcancer,
  smooth.formula=~s(log(rectime))+s(log(rectime),by=hormon))
plot(fit.tvc,newdata=data.frame(hormon=1),type="hazard",line.col="blue",lwd=2,
  rug=FALSE)
plot(fit.tvc,newdata=data.frame(hormon=0),type="hazard",line.col="red",lwd=2,
  add=TRUE)

## Smooth covariate effects
fit.smoothx <- pstpm2(Surv(rectime,censrec==1)~1,
  data=brcancer,
  smooth.formula=~s(log(rectime))+s(x1))
ages <- seq(21,80,length=301)
haz <- predict(fit.smoothx,newdata=data.frame(hormon=1,rectime=365,x1=ages),
               type="hazard",se.fit=TRUE)
matplot(ages,haz/haz[150,1],type="l",log="y",ylab="Hazard ratio")

## compare with df=5 from stpm2
fit.stpm2 <- stpm2(Surv(rectime/365,censrec==1)~hormon,data=brcancer,df=7)
plot(fit,newdata=data.frame(hormon=1),type="hazard",line.col="blue",lwd=2,
  rug=FALSE,ylim=c(0,1e-3))
plot(fit.stpm2,newdata=data.frame(hormon=1),type="hazard",line.col="orange",lwd=2,
  rug=FALSE,add=TRUE,ci=FALSE)

## time-varying coefficient
##summary(fit.tvc <- pstpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,
##                     tvc=list(hormon=3)))
##anova(fit,fit.tvc) # compare with and without tvc (unclear whether this is valid)

## some more plots
## plot(fit.tvc,newdata=data.frame(hormon=0),type="hr",var="hormon")
                                        # no lines method: use add=TRUE
## plot(fit.tvc,newdata=data.frame(hormon=1),type="hr",var="hormon",
##     add=TRUE,ci=FALSE,line.col=2)

## plot(fit.tvc,newdata=data.frame(hormon=0),type="sdiff",var="hormon")

## plot(fit.tvc,newdata=data.frame(hormon=0),type="hdiff",var="hormon")

## plot(fit.tvc,newdata=data.frame(hormon=0),type="hazard")
## plot(fit.tvc,newdata=data.frame(hormon=1),type="hazard",line.col=2,ci=FALSE,add=TRUE)


## End(Not run)