library(knitr) opts_chunk$set( fig.align = "center", crop = TRUE)
library(ggplot2) theme_set(theme_bw()) library(dplyr) library(survival) library(mgcv) library(pammtools) Set1 <- RColorBrewer::brewer.pal(9, "Set1")
Here we briefly demonstrate how to fit and visualize a simple
baseline model using the pammtools
package.
We illustrate the procedure using a subset of the tumor
data from the pammtools
package:
data("tumor") head(tumor) tumor <- tumor[1:200,]
The below graph depicts the estimated cumulative hazard using the Nelson-Aalen estimator:
base_df <- basehaz(coxph(Surv(days, status)~1, data = tumor)) %>% rename(nelson_aalen = hazard) ggplot(base_df, aes(x = time, y = nelson_aalen)) + geom_stephazard() + ylab(expression(hat(Lambda)(t))) + xlab("t") + ggtitle("Nelson-Aalen estimate of the cumulative hazard")
To fit a PAM, we first we need to bring the data in a suitable format (see vignette on data transformation).
# Use unique event times as interval break points ped <- tumor %>% as_ped(Surv(days, status)~., id = "id") head(ped[, 1:10])
PAMs estimate the baseline log-hazard rate semi-parametrically as a smooth, non-linear function evaluated at the end-points tend
of the intervals defined
for our model.
Note that the estimated log-hazard value at time-points tend
gives the value
of the log-hazard rate for the entire previous interval as PAMs estimate
hazard rates that are constant in each interval.
Estimating the log hazard rate as a smooth function evaluated at tend
-
instead of using an unpenalized estimator without such a smoothness assumption -
ensures that the hazard rate does not change too rapidly from interval to
interval unless there is sufficient evidence for such changes in the data.
pam <- gam(ped_status ~ s(tend), data = ped, offset = offset, family = poisson()) summary(pam)
In the figure below we compare the previous baseline estimates of the Cox model with the PAM estimates.
Expand here for R-Code
# Create new data set with one row per unique interval # and add information about the cumulative hazard estimate int_df <- make_newdata(ped, tend = unique(tend)) %>% add_cumu_hazard(pam) gg_baseline <- ggplot(int_df, aes(x = tend)) + geom_line(aes(y = cumu_hazard, col = "PAM")) + geom_stephazard(data = base_df, aes(x=time, y = nelson_aalen, col = "Nelson-Aalen")) + scale_color_manual( name = "Method", values = c("PAM" = "black", "Nelson-Aalen" = Set1[1])) + theme(legend.position = "bottom") + ylab(expression(hat(Lambda)(t))) + xlab("t") + ggtitle(paste0("Comparison of cumulative hazards estimated by\n", "Cox-PH (Nelson-Aalen) vs. PAM"))
gg_baseline
Both models are in good agreement.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.