library(knitr)
opts_chunk$set(
  fig.align = "center",
  crop      = TRUE)
library(ggplot2)
theme_set(theme_bw())
library(dplyr)
library(survival)
library(mgcv)
library(pammtools)
Set1 <- RColorBrewer::brewer.pal(9, "Set1")

Here we briefly demonstrate how to fit and visualize a simple baseline model using the pammtools package. We illustrate the procedure using a subset of the tumor data from the pammtools package:

data("tumor")
head(tumor)
tumor <- tumor[1:200,]

The below graph depicts the estimated cumulative hazard using the Nelson-Aalen estimator:

base_df <- basehaz(coxph(Surv(days, status)~1, data = tumor)) %>%
  rename(nelson_aalen = hazard)
ggplot(base_df, aes(x = time, y = nelson_aalen)) +
  geom_stephazard() +
  ylab(expression(hat(Lambda)(t))) + xlab("t") +
  ggtitle("Nelson-Aalen estimate of the cumulative hazard")

Data transformation

To fit a PAM, we first we need to bring the data in a suitable format (see vignette on data transformation).

# Use unique event times as interval break points
ped <- tumor %>% as_ped(Surv(days, status)~., id = "id")
head(ped[, 1:10])

PAMs

PAMs estimate the baseline log-hazard rate semi-parametrically as a smooth, non-linear function evaluated at the end-points tend of the intervals defined for our model.

Note that the estimated log-hazard value at time-points tend gives the value of the log-hazard rate for the entire previous interval as PAMs estimate hazard rates that are constant in each interval.

Estimating the log hazard rate as a smooth function evaluated at tend - instead of using an unpenalized estimator without such a smoothness assumption - ensures that the hazard rate does not change too rapidly from interval to interval unless there is sufficient evidence for such changes in the data.

pam <- gam(ped_status ~ s(tend), data = ped, offset = offset, family = poisson())
summary(pam)

Graphical comparison

In the figure below we compare the previous baseline estimates of the Cox model with the PAM estimates.

Expand here for R-Code

# Create new data set with one row per unique interval
# and add information about the cumulative hazard estimate
int_df <- make_newdata(ped, tend = unique(tend)) %>%
  add_cumu_hazard(pam)

gg_baseline <- ggplot(int_df, aes(x = tend)) +
  geom_line(aes(y = cumu_hazard, col = "PAM")) +
  geom_stephazard(data = base_df, aes(x=time, y = nelson_aalen, col = "Nelson-Aalen")) +
  scale_color_manual(
    name   = "Method",
    values = c("PAM" = "black", "Nelson-Aalen" = Set1[1])) +
  theme(legend.position = "bottom") +
  ylab(expression(hat(Lambda)(t))) + xlab("t") +
  ggtitle(paste0("Comparison of cumulative hazards estimated by\n",
    "Cox-PH (Nelson-Aalen) vs. PAM"))

gg_baseline

Both models are in good agreement.



adibender/pammtools documentation built on Feb. 27, 2024, 8:40 a.m.