gsm: Parametric and penalised generalised survival models

View source: R/pm2-3.R

gsmR Documentation

Parametric and penalised generalised survival models

Description

This implements the generalised survival model g(S(t|x)) = eta, where g is a link function, S is survival, t is time, x are covariates and eta is a linear predictor. The linear predictor can include either parametric or penalised smoothers for the time effects, for time:covariate interactions and for covariate effects. The main model assumption is that the time effects in the linear predictor are smooth. This extends the class of flexible parametric survival models developed by Royston and colleagues. The model has been extended to include relative survival (excess hazards), Gamma frailties and normal random effects.

Usage

gsm(formula, data, smooth.formula = NULL, smooth.args = NULL,
                df = 3, cure = FALSE,
                tvc = NULL, tvc.formula = NULL,
                control = list(), init = NULL,
                weights = NULL, robust = FALSE, baseoff = FALSE,
                timeVar = "", time0Var = "", use.gr = NULL,
                optimiser=NULL, log.time.transform=TRUE,
                reltol=NULL, trace = NULL,
                link.type=c("PH","PO","probit","AH","AO"), theta.AO=0,
                contrasts = NULL, subset = NULL,
                robust_initial=NULL,
                coxph.strata = NULL, coxph.formula = NULL,
                logH.formula = NULL, logH.args = NULL,
                bhazard = NULL, bhazinit=NULL, copula=FALSE,
                frailty = !is.null(cluster) & !robust & !copula,
                cluster = NULL, logtheta=NULL,
                nodes=NULL, RandDist=c("Gamma","LogN"), recurrent = FALSE,
                adaptive = NULL, maxkappa = NULL,
                sp=NULL, criterion=NULL, penalty=NULL,
                smoother.parameters=NULL, Z=~1, outer_optim=NULL,
                alpha=1, sp.init=1,
                penalised=FALSE,
                ...)
stpm2(formula, data, weights=NULL, subset=NULL, coxph.strata=NULL, ...)
pstpm2(formula, data, weights=NULL, subset=NULL, coxph.strata=NULL, ...)

Arguments

formula

a formula object, with the response on the left of a ~ operator, and the parametric terms on the right. The response must be a survival object as returned by the Surv function. Specials include cluster and bhazard. [required]

data

a data.frame in which to interpret the variables named in the formula argument.

smooth.formula

either a parametric formula or a penalised mgcv::gam formula for describing the time effects and time-dependent effects and smoothed covariate effects on the linear predictor scale (default=NULL). The default model is equal to ~s(log(time),k=-1) where time is the time variable.

df

an integer that describes the degrees of freedom for the ns function for modelling the baseline log-cumulative hazard (default=3). Parametric model only.

smooth.args

a list describing the arguments for the s function for modelling the baseline time effect on the linear predictor scale (default=NULL).

tvc

a list with the names of the time-varying coefficients. For a parametric model, this uses natural splines (e.g. tvc=list(hormon=3) is equivalent to smooth.formula=~...+as.numeric(hormon):nsx(log(time),df=3)), which by default does not include an intercept term, hence you should include a main effect. Note that this will convert a logical or factor variable to a numeric value, so the user should use indicators for factor terms. For a penalised model, this uses cubic splines (e.g. tvc=list(hormon=-1) is equivalent to smooth.formula=~...+s(log(time),by=hormon,k=-1)), which by default does include an intercept (or main effect) term (and this code will remove any main effect from formula).

tvc.formula

separate formula for the time-varying effects. This is combined with smooth.formula or the default smooth.formula.

baseoff

Boolean used to determine whether fully define the model using tvc.formula rather than combining logH.formula and tvc.formula

logH.args

as per smooth.args. Deprecated.

logH.formula

as per smooth.formula. Deprecated.

cure

logical for whether to estimate a cure model (parametric model only).

control

list of arguments passed to gsm.control.

init

init should either be NULL, such that initial values will be determined using Cox regression, or a numeric vector of initial values.

coxph.strata

variable in the data argument for stratification of the coxph model fit for estimating initial values.

weights

an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector.

robust

Boolean used to determine whether to use a robust variance estimator.

bhazard

variable for the baseline hazard for relative survival

bhazinit

scalar used to adjust the background cumulative hazards for calculating initial values. Default=0.1. Deprecated argument: use of the control argument is preferred.

copula

logical to indicate whether to use a copula model (experimental)

timeVar

variable defining the time variable. By default, this is determined from the survival object, however this may be ambiguous if two variables define the time

sp

fix the value of the smoothing parameters.

use.gr

in R, a Boolean to determine whether to use the gradient in the optimisation. Default=TRUE, Deprecated argument: use of the control argument is preferred.

criterion

in Rcpp, determine whether to use "GCV" or "BIC" for for the smoothing parameter selection.

penalty

use either the "logH" penalty, which is the default penalty from mgcv, or the "h" hazard penalty. Default="logH". Deprecated argument: use of the control argument is preferred.

smoother.parameters

for the hazard penalty, a list with components which are lists with components var, transform and inverse.

alpha

an ad hoc tuning parameter for the smoothing parameter.

sp.init

initial values for the smoothing parameters.

trace

integer for trace reporting; 0 represents no additional reporting. Default=0. Deprecated argument: use of the control argument is preferred.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

coxph.formula

additional formula used to improve the fitting of initial values [optional and rarely used].

time0Var

string variable to determine the entry variable; useful for when more than one data variable is used in the entry time.

link.type

type of link function. For "PH" (generalised proportional hazards), g(S)=log(-log(S)); for "PO" (generalised proportional odds), g(S)=-logit(S); for "probit" (generalised probit), g(S)=-probit(S); for "AH" (generalised additive hazards), g(S)=-log(S); for "AO" (generalised Aranda-Ordaz), g(S)=log((S^(-theta.AO)-1)/theta.AO).

theta.AO

theta parameter for the Aranda-Ordaz link type.

optimiser

select which optimiser is used. Default="BFGS". Deprecated argument: use of the control argument is preferred.

log.time.transform

should a log-transformation be used for calculating the derivative of the design matrix with respect to time? (default=TRUE)

recurrent

logical for whether clustered, left truncated data are recurrent or for first event (where the latter requires an adjustment for the frailties or random effects)

frailty

logical for whether to fit a shared frailty model

cluster

variable that determines the cluster for the frailty. This can be a vector, a string for the column, or a name. This can also be specified using a special.

logtheta

initial value for log-theta used in the gamma shared frailty model (defaults to value from a coxph model fit)

nodes

number of integration points for Gaussian quadrature. Default=9. Deprecated argument: use of the control argument is preferred.

RandDist

type of distribution for the random effect or frailty

adaptive

logical for whether to use adaptive or non-adaptive quadrature, Default=TRUE. Deprecated argument: use of the control argument is preferred.

maxkappa

double float value for the maximum value of the weight used in the constraint. Default=1000. Deprecated argument: use of the control argument is preferred.

Z

formula for the design matrix for the random effects

reltol

list with components for search and final relative tolerances. Default=list(search=1e-10, final=1e-10, outer=1e-5). Deprecated argument: use of the control argument with arguments reltol.search, reltol.final and reltol.outer is preferred.

outer_optim

Integer to indicate the algorithm for outer optimisation. If outer_optim=1 (default), then use Neldear-Mead, otherwise use Nlm.

robust_initial

logical for whether to use Nelder-Mead to find initial values (max 50 iterations). This is useful for ill-posed initial values. Default= FALSE. Deprecated argument: use of the control argument is preferred.

penalised

logical to show whether to use penalised models with pstpm (penalised=TRUE) or parametrics models with stpm2 (penalised=FALSE).

...

additional arguments to be passed to the mle2.

Details

The implementation extends the mle2 object from the bbmle package.

The default smoothers for time on the linear predictor scale are nsxs(log(time),df=3) for the parametric model and s(log(time)) for the penalised model.

A frequently asked question is: why does rstpm2 give different spline estimates to flexsurv and Stata's stpm2? The short answer is that rstpm2 uses a different natural spline basis compared with flexsurv and Stata's stpm2 and slightly different knot placement than Stata's stpm2. If the knot placement is the same, then the predictions and other coefficients are expected to be very similar. As a longer answer, the default smoother in rstpm2 is to use an extension of the splines::ns function (rstpm2::nsx), which uses a QR projection of B-splines for natural splines. In contrast, flexsurv and Stata's stpm2 use truncated power splines for the natural spline basis (also termed 'restricted cubic splines'). The B-splines are known to have good numerical properties, while Stata's stpm2 implementation defaults to using matrix orthogonalisation to account for any numerical instability in the truncated power basis. Furthermore, rstpm2 allows for any smooth parametric function to be used as a smoother in stpm2/gsm, which is an extension over flexsurv and Stata's stpm2. Finally, it may be difficult to get rstpm2 and Stata's stpm2 to return the same estimates: although nsx includes an argument stata.stpm2.compatible = FALSE (change to TRUE for compatibility), the design matrix for rstpm2 is based on individuals with events, while Stata's stpm2 determines the spline knots from the individuals with events and the design matrix is otherwise based on all individuals.

Value

Either a stpm2-class or pstpm2-class object.

Author(s)

Mark Clements, Xing-Rong Liu, Benjamin Christoffersen.

Examples

## Not run: 
    data(brcancer)
    summary(fit <- stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,df=3))
    
    ## some predictions
    head(predict(fit,se.fit=TRUE,type="surv"))
    head(predict(fit,se.fit=TRUE,type="hazard"))
    
    ## some plots
    plot(fit,newdata=data.frame(hormon=0),type="hazard")
    plot(fit,newdata=data.frame(hormon=0),type="surv")

    ## time-varying coefficient
    summary(fit.tvc <- stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,df=3,
                             tvc=list(hormon=3)))
    anova(fit,fit.tvc) # compare with and without tvc
    
    ## some more plots
    plot(fit.tvc,newdata=data.frame(hormon=0),type="hr",var="hormon", ylim=c(0,2))
    lines(fit.tvc,newdata=data.frame(hormon=1),type="hr",var="hormon",
          col=2)

    plot(fit.tvc,newdata=data.frame(hormon=0),type="sdiff",var="hormon")
    
    plot(fit.tvc,newdata=data.frame(hormon=0),type="hdiff",var="hormon")

    library(scales)
    cols <- c(alpha("red",alpha=0.2), alpha("blue",alpha=0.2))
    plot(fit.tvc,newdata=data.frame(hormon=0),type="hazard",ci.col=cols[1])
    lines(fit.tvc,newdata=data.frame(hormon=1),type="hazard",lty=2,ci.col=cols[2],
          ci=TRUE)
    legend("topright",legend=c("No hormonal treatment", "(95
           lty=c(1,1,2,1), lwd=c(1,10,1,10), col=c("black",cols[1],"black",cols[2]), bty="n")

    
    ## compare number of knots
    hormon0 <- data.frame(hormon=0)
    plot(fit,type="hazard",newdata=hormon0)
    AIC(fit)
    for (df in 4:6) {
        fit.new <- stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,df=df)
        plot(fit.new,type="hazard",newdata=hormon0,add=TRUE,ci=FALSE,line.col=df)
        print(AIC(fit.new))
    }

    ## compatibility with Stata's stpm2 using the smooth.formula argument (see Details)
    summary(stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,
                  smooth.formula=~nsx(log(rectime),df=3,stata.stpm2.compatible=TRUE)))
    summary(stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,
                  smooth.formula=~nsx(log(rectime),df=3,stata=TRUE)+
                  hormon:nsx(log(rectime),df=3,stata=TRUE)))

    
## End(Not run)

rstpm2 documentation built on Oct. 29, 2024, 5:06 p.m.