library(knitr) opts_chunk$set( fig.align = "center", cache = TRUE, message = FALSE, fig.height = 5, fig.width = 5 )
In this tutorial we present basic examples for the usage of this package and
compare results to the ones obtained with classical approaches, namely using
the coxph
function from the survival
package.
In the following two sections, we first describe the classical piece-wise exponential model (PEM) and after that the extension to piece-wise exponential additive models (PAM).
The strength of the PEM is that analysis of time-to-event data can be performed using algorithms designed to fit Generalized Linear Models. This approach yields equivalent coefficient estimates as Cox Proportional Hazards models if
(1) there are no ties (i.e., simultaneous events) in the data and
(2) all unique event (and censoring) times are used as interval cut points in
order to transform the data into the format suitable for PEMs
(see ?split_data
and vignette("data-transformation", package = "pam")
).
Estimates between the two approaches can differ due to a more crude handling of ties in the PEM approach [@whitehead1980]. In practice these differences are not very large (see examples below).
We first demonstrate the equivalence using a subset of the Veterans'
data [@Kalbfleisch1980] provided in the survival
package:
library(survival) library(mgcv) library(pam) library(dplyr) data("veteran", package = "survival") # remove ties to illustrate equivalence with Cox approach vetu <- filter(veteran, !duplicated(time)) vetu.ped <- split_data(Surv(time, status)~., cut = unique(vetu$time), data = vetu, id = "id") pem.age <- glm(ped_status ~ interval - 1 + age, data = vetu.ped, family = poisson(), offset = offset) ## cox model for comparison cph.age <- coxph(Surv(time, status) ~ age, data = vetu) ## compare coefficients cbind( pem = coef(pem.age)["age"], cox = coef(cph.age))
In this case both models yield equivalent estimates.
## Using the full data set (with ties) yields slightly different results # when comparing PEM to Cox-PH vet.ped <- split_data(Surv(time, status)~., cut = unique(veteran$time), data = veteran, id = "id") pem2.age <- glm(ped_status ~ interval - 1 + age, data = vet.ped, family = poisson(), offset = offset) cph2.age <- coxph(Surv(time, status) ~ age, data = veteran) ## compare coefficient estimate to Cox-PH estimate cbind( pem = coef(pem2.age)["age"], cox = coef(cph2.age))
PAMs have two main advantages over PEMs:
PAM estimation scales better when the number of intervals becomes large: PEMs need to estimate one parameter per interval for the baseline hazard, while the number of parameters in PAMs only depends on the number of basis functions used for the spline estimate of the baseline hazard.
In PEMs, baseline hazard estimates for each interval can vary a lot between neighboring intervals, especially when intervals only contain few events. For PAMs on the other hand estimates of the baseline hazard in neighboring intervals are similar due to penalization unless the data provides very strong evidence for large changes between neighboring intervals.
Note that coefficient estimates in PAMs are no longer equivalent to those from Cox PH models, since the estimation of the baseline hazard is performed semi-parametrically. In our experience the differences are very small.
Using the example above we get:
## compare to PAM pam.age <- gam(ped_status ~ s(tend) + age, data = vetu.ped, family = "poisson", offset = offset) cbind( pam = coef(pam.age)["age"], pem = coef(pem.age)["age"], cox = coef(cph.age)["age"])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.