library(knitr)
opts_chunk$set(
  fig.align  = "center",
  fig.width  = 4,
  fig.height = 4,
  crop       = TRUE)
library(ggplot2)
theme_set(theme_bw())
library(magrittr)
library(dplyr)
library(survival)
library(mgcv)
library(pamm)
Set1 <- RColorBrewer::brewer.pal(9, "Set1")

Here we briefly demonstrate how to fit and visualize a simple baseline model using the pamm package. We illustrate the procedure using the veteran data from the survival package:

data("veteran", package="survival")
veteran %<>%
  mutate(
    trt   = 1*(trt == 2),
    prior = 1*(prior != 10)) %>%
  filter(time < 400)

The below graph depicts the estimated cumulative hazard using the Nelson-Aalen estimator:

base_df <- basehaz(coxph(Surv(time, status)~1, data = veteran))
ggplot(base_df, aes(x = time, y = hazard)) +
  geom_step() +
  ylab(expression(hat(Lambda)(t))) + xlab("t") +
  ggtitle("Nelson-Aalen estimate of the cumulative hazard")

Data transformation

First we need to bring the data in a suitable format (see vignette("data-transformation", package = "pam")).

# Use unique event times as interval break points
ped <- split_data(Surv(time, status)~., data = veteran, id = "id")
head(ped)

Fit the baseline piece-wise exponential model (PEM)

pem <- glm(ped_status ~ interval, data = ped, offset = offset, family = poisson())

Extract cumulative baseline estimate:

int_df <- int_info(ped)
head(int_df)
int_df %<>% add_cumhazard(pem) %>%
  left_join(base_df, by = c("tend" = "time"))
head(int_df)

Visualize the PEM estimate:

ggplot(int_df, aes(x = tend)) +
  geom_step(aes(y = hazard, col = "Nelson-Aalen")) +
  geom_line(aes(y = cumhazard, col = "PEM")) +
  scale_color_manual(name = "Method", values=c(Set1[1:2])) +
  theme(legend.position = "bottom") +
  ylab(expression(hat(Lambda)(t))) + xlab("t") +
  ggtitle("Comparison of cumulative hazards using \n Cox-PH (Nelson-Aalen) vs. PEM")
all.equal(int_df$cumhazard, int_df$hazard)

Fit the baseline using Piece-wise exponential additive model (PAM)

Alternatively, we could use PAMs. This means estimating the baseline log-hazard rate semi-parametrically as a smooth, non-linear function evaluated at the end-points tend of the intervals defined for our model.

Note that the estimated log-hazard value at time-points tend gives the value of the log-hazard rate for the entire previous interval as PAMs use hazard rates that are constant in each interval - that's where the piece-wise in the name of the method comes from.

Estimating the log hazard rate as a smooth function evaluated at tend - instead of using an unpenalized estimator without such a smoothness assumption - ensures that the hazard rate does not change too rapidly from interval to interval unless there is very strong evidence for such changes in the data.

pam <- gam(ped_status ~ s(tend), data = ped, offset = offset, family = poisson())
summary(pam)

Graphical comparison

In the figure below we compare the previous baseline estimates with the PAM estimates. All models are in good agreement.

Expand here for R-Code

int_df$pamhaz <- predict(pam, newdata = int_df, type = "response")
int_df %<>% mutate(pamch = cumsum(pamhaz * intlen))

gg_baseline <- ggplot(int_df, aes(x = tend)) +
  # geom_ribbon(aes(ymin=cumlower, ymax=cumupper), alpha=0.2) +
  geom_step(aes(y = hazard,    col = "Nelson-Aalen"), lty = 2) +
  geom_line(aes(y = cumhazard, col = "PEM")) +
  geom_line(aes(y = pamch,     col = "PAM")) +
  scale_color_manual(
    name   = "Method",
    values = c("PEM" = Set1[2],"PAM" = 1,"Nelson-Aalen" = Set1[1])) +
  theme(legend.position = "bottom") +
  ylab(expression(hat(Lambda)(t))) + xlab("t") +
  ggtitle(paste0("Comparison of cumulative hazards using\n",
    "Cox-PH (Nelson-Aalen) vs. PEM vs. PAM"))

gg_baseline


adibender/pamm documentation built on May 14, 2019, 5:22 p.m.