Biostatistics III in R

Exercise 28. Flexible parametric survival models in R

library('knitr')
read_chunk('../q28.R')
opts_chunk$set(cache=FALSE, fig.width=10, fig.height=6)


(a) ##

The stpm2 output can be seen below.


The hazard ratio, 95% confidence interval and statistical significance are very similar to the Cox model.


(b) ##

The predicted survival and hazard functions are shown below.



(c) ##

There is a constant difference as the predictions are from a proportional hazards model and a multiplicative effect becomes additive on the log scale.


(d) ##

The log hazard ratios (and hence the hazard ratios) from 2 df and up are similar and for 3 df they are very similar. The main difference is for 1 df, which is equivalent to a Weibull model. The Weibull model enforces a monotonic hazard function and as the hazard function in the melanoma data has a turning point it is clearly inappropriate.\ The lowest AIC is for the model with 5 df and for the BIC it is the model with 2 df. The penalty term in the AIC is twice the number of parameters ($2 \times k$) whereas in the BIC it is $\log(D) \times k$ where $D$ is the number of events. Since $\log(D) > k$ the BIC penalizes extra parameters much more strongly than AIC. Since we have a large data set and there are no disadvantages to including extra parameters we would use 5 df for the baseline hazard.


(e) ##


The code for hazards is very similar:


With the exception of 1 df (the Weibull model), the survival and hazard functions show similar shapes, so as long we have enough knots our conclusions would be very similar.

(f) ##


After modelling for sex, calendar period age group and time with five degrees of freedom, the adjusted hazard ratios for: females compared with males was 0.59 (95% CI: 0.54, 0.64); diagnoses during the period 1985--1994 compared with those diagnosed 1974--1984 was 0.72 (95% CI: 0.66, 0.79); and age groups 45--59, 60--74 and 75+ compared with ages 0--44 years were 1.33 (95% CI: 1.17, 1.49), 1.86 (95% CI: 1.66. 2.07) and 3.40 (95% CI: 2.93, 3.92), respectively.

Undertaking a likelihood ratio test, there was strong evidence (p<1e-16) that age group contributed significantly to the model fit.

(g) ##


The estimates are so similar because very similar models are being fitted with exactly the same covariates. The two models differ only in the manner in which they account for the baseline hazard. In the Cox model it is assumed arbitrary and not directly estimated. In the flexible parametric model the baseline hazard is modelled using splines. The 5 df spline allows sufficient flexibility to make the model estimates virtually identical.

(h) ##


There is strong evidence of a non-proportional effect of age.

We could also investigate the non-proportional effect of age with penalized models, where sp is the optimal smoothing parameters estimated from models without sp argument:


(i) ##

The baseline hazard is shown below. This baseline is for the youngest age group who are male and diagnosed in 1975–1984, i.e, when all the covariates are equal to zero.


(j) ##



    The hazard ratios decrease as a function of follow-up time. The
    hazard ratio is so high during the early years of follow-up
    because the hazard in the reference group is close to
    zero. The hazard ratio for the oldest age
    group with 95% confidence intervals is also shown.

(k) ##


The hazard difference is small early on, despite the hazard ratio being large, because the underlying hazard is so low.

(l) ##


(m) ##




Try the biostat3 package in your browser

Any scripts or data that you put into this service are public.

biostat3 documentation built on Oct. 29, 2024, 5:07 p.m.