library("learnr") library("survival") library("lattice") library("splines") data("lung", package = "survival") lung$sex <- factor(lung$sex, labels = c("male", "female")) lung <- with(lung, lung[complete.cases(time, status, sex, age, ph.karno), ]) knitr::opts_chunk$set(echo = FALSE) options(tutorial.exercise.timelimit = 1200)
The following questions test your knowledge in the extensions of Cox proportional hazards models, presented in Chapter 5.
quiz( question("Why do we need the Breslow estimator to calculate survival probabilities from a Cox model (more than one correct possible)?", answer("Because the Cox model does not estimate the baseline hazard.", correct = TRUE), answer("Because the Cox model does not make an assumption for the distribution of the survival times.", correct = TRUE), answer("Because the Cox model makes the proportional hazards assumption."), answer("Because the Cox model does not make an assumption for the form of the baseline hazard.", correct = TRUE), allow_retry = TRUE, random_answer_order = TRUE ) )
quiz( question("Which of the following statements are reasons to use a stratified Cox model (more than one correct possible)?", answer("To account for categorical covariates that do not satisfy PH.", correct = TRUE), answer("To account for continuous covariates that do not satisfy PH."), answer("To account for categorical covariates that are time-varying."), answer("To account for continuous covariates that are time-varying"), answer("To account for heterogeneity caused by categorical covariates.", correct = TRUE), answer("To account for heterogeneity caused by continuous covariates."), allow_retry = TRUE, random_answer_order = TRUE ) )
quiz( question("Which of the following examples are endogenous time-varying covariates (more than one correct possible)?", answer("Prostate-specific antigen levels for prostate cancer patients.", correct = TRUE), answer("Type of nurse that treats the patients."), answer("Type of hospital (university or not) visited by the patient."), answer("Aortic gradient values of patient who received an aortic valve transplantation.", correct = TRUE), allow_retry = TRUE, random_answer_order = TRUE ) )
A randomized study has been performed to assess the efficacy and safety of a new drug for AIDS. Patients were randomized either to the new drug or to the standard treatment, and the primary outcome was time to death. We have collected data from 55 different centers and we would like to test whether the new treatment significantly prolongs survival over all the centers (i.e., we want the pooled effect). Relatively few deaths have been recorded during follow-up, and in some centers we did not have any event.
quiz( question("Which of the following types of analysis would you follow?", answer("A Cox model that includes as covariate the treatment indicator."), answer("A Cox model that includes as covariate the treatment indicator, but with corrected standard errors taking into account that we have data for different centers.", correct = TRUE), answer("A Cox model that includes as covariate the treatment indicator, and a frailty term for center."), answer("A Cox model that includes as covariates the treatment indicator and the center indicator."), allow_retry = TRUE, random_answer_order = TRUE ) )
In a study on HIV infected patients, we are interested in the time until they develop AIDS. However, some patients die before an AIDS diagnosis for reasons related to their disease. We would like to estimate the survival function for the time to AIDS, while accounting for these deaths.
quiz( question("Which of the following options would you choose?", answer("A non-parametric estimator of the cumulative incidence functions for the time to AIDS accounting for competing risks.", correct = TRUE), answer("The Kaplan-Meier estimator of the survival function for the time to AIDS, treating patient who died as censored."), answer("Fit a Cox model for the time to AIDS, treating patients who died as censored. Use the Breslow estimator to derive an estimate of the survival function for the time to AIDS."), answer("Fit an AFT model for the time to AIDS, treating patients who died as censored. Use the Breslow estimator to derive an estimate of the survival function for the time to AIDS."), allow_retry = TRUE, random_answer_order = TRUE ) )
The purpose of this exercise is to illustrate how to perform a representative Cox PH regression analysis, including the extensions seen in the last sections of Chapter 4 and in Chapter 5.
The following questions are based on the Lung data set. This data set is available as
the object lung
from package survival, which is already loaded in this session.
From this data set we will use the following variables:
time
: the observed time-to-death in days.
status
: the event indicator; '1' denotes censored and '2' denotes death.
age
: the age in years.
ph.karno
: Karnofsky performance score rated by the physician.
sex
: the sex indicator with values 'male' and 'female'.
ph.ecog
: ECOG performance score (0=good 5=dead).
For the exercises below it will be useful to check the corresponding sections of the Survival Analysis in R Companion that are given in the hints.
If you decide to work directly in R and not in this online tutorial, then you need to first run the following lines of code:
library("survival") library("lattice") library("splines") data("lung", package = "survival") lung$sex <- factor(lung$sex, labels = c("male", "female")) lung <- with(lung, lung[complete.cases(time, status, sex, age, ph.karno), ])
Our initial hypothesis is that the time-to-death is affected by sex
, age
andph.karno
. Also, the physicians believe that the effect of ph.karno
and age
may be nonlinear in the log-hazard scale. Moreover, the (possibly nonlinear -- model using natural cubic splines with 3 degrees of freedom) effects of age
and ph.karno
on the log-hazard scale are not the same for males and females. Transform this initial hypothesis
into a suitable Cox PH model. Name the fitted model fit_1_full
.
The aim here is to do a realistic analysis of a survival dataset with a Cox PH model. This involves the following steps:
a. We first translate our initial hypothesis into a full model that contains all terms of interest. This includes all covariates we are interested in and also possibly nonlinear and interaction terms.
b. We then first test the important assumption behind the model. In the case of the Cox model that is the proportional hazards assumption. (In the case of an AFT model that is the distribution of the error terms). We need to do that first and rectify any problems with these assumptions before proceeding to simplify the model using hypothesis testing.
c. Then we continue by performing an omnibus test for all interaction terms in the model and see if we can drop them. Typically, using a p-value threshold higher than 5%, e.g., we can use 15%. This is to ensure that we do not miss any potentially interesting interactions. If the test suggests that some interactions may seem to improve the fit of the model, then we can proceed to see which interaction terms specifically achieve that. We test then each interaction separately, and at the end we can correct the p-values for multiple testing. Hence, at the final stage of this step we will know which interaction terms we will keep in the model.
d. We do the same for the nonlinear terms. Namely, first, we start by the omnibus test, and if the p-value is smaller than 15%, we are going to see which nonlinear terms we need. Hence, at the final stage of this step, we will know our final model. Note, that unless the aim is to do prediction, it is not advisable to remove non statistically significant covariates from the final model.
e. Finally, we interpret the results using the table of coefficients and effect plots if necessary.
# Check the examples in slides 191 & 262-263, and # Section 4.1, Survival Analysis in R Companion # Fit the Cox PH model with nonlinear and interaction effects
# The code is (this code only fits the model and does not produce any output): fit_full <- coxph(Surv(time, status) ~ sex * (ns(age, 3) + ns(ph.karno, 3)), data = lung)
# Check the PH assumption using function cox.zph() - check the code in slide 240, and # Section 4.4, Survival Analysis in R Companion # Produce the plot separately per term. We want to see if the line is horizontal. # Check Section 4.4, Survival Analysis in R Companion
# The code is check_PH <- cox.zph(fit_full) check_PH plot(check_PH)
# We did not observe any serious violations of PH. Hence, we proceed with the # omnibus test for all interaction terms. # Fit the Cox PH model without the interaction effects, and do the LRT # using anova()
# The code is fit_noInt <- coxph(Surv(time, status) ~ sex + ns(age, 3) + ns(ph.karno, 3), data = lung) anova(fit_noInt, fit_full)
# The p-value is smaller than 15%. Hence, we will proceed to see which interaction # terms seem to play a role. # We start by testing the interaction between sex and nonlinear age. To do that we need # to fit the model that excludes this interaction term and compare it with an LRT with # the full model, i.e., fit_noInt_Age <- coxph(Surv(time, status) ~ sex * ns(ph.karno, 3) + ns(age, 3), data = lung) aov_Int_Age <- anova(fit_noInt_Age, fit_full) aov_Int_Age
# We observe that the interaction term between sex and age does not seem to improve the # fit of the model. We proceed to see if the interaction term between sex and # ph.karno does, i.e., fit_noInt_Karno <- coxph(Surv(time, status) ~ sex * ns(age, 3) + ns(ph.karno, 3), data = lung) aov_Int_Karno <- anova(fit_noInt_Karno, fit_full) aov_Int_Karno
# The p-value is significant at 5% level, but we have not corrected it yet for # multiple testing. We do that now using the p.adjust() function: pvals <- c(aov_Int_Age$`P(>|Chi|)`[2], aov_Int_Karno$`P(>|Chi|)`[2]) p.adjust(pvals)
# Hence, from the two interaction we only keep the one between sex and ph.karno # We proceed to do exactly the same for nonlinear terms. We first start by the # omnibus test for all nonlinear terms, i.e., fit_noSplines <- coxph(Surv(time, status) ~ sex * (age + ph.karno), data = lung) anova(fit_noSplines, fit_full)
# The p-value is smaller than 15%. Hence, we will proceed to see which nonlinear # terms seem to play a role. # We start by testing the nonlinear for age. To do that we need to fit the model that # assumes that the effect of age is linear, compare it with an LRT with # the full model, i.e., fit_noSplines_Age <- coxph(Surv(time, status) ~ sex * (age + ns(ph.karno, 3)), data = lung) aov_Spl_Age <- anova(fit_noSplines_Age, fit_full) aov_Spl_Age
# We observe that the nonlinear terms for age do not seem to improve the # fit of the model. We proceed to see if the nonlinear terms for ph.karno do, i.e., fit_noSplines_Karno <- coxph(Surv(time, status) ~ sex * (ns(age, 3) + ph.karno), data = lung) aov_Spl_Karno <- anova(fit_noSplines_Karno, fit_full) aov_Spl_Karno
# The p-value is significant at 5% level, but we have not corrected it yet for # multiple testing. We do that now using the p.adjust() function: pvals <- c(aov_Spl_Age$`P(>|Chi|)`[2], aov_Spl_Karno$`P(>|Chi|)`[2]) p.adjust(pvals)
# We observe that the nonlinear terms for ph.karno are no longer significant at 5%. # But we still opt to include them in the model to explore in more detail how # ph.karno affects the risk. We will see from the effect plot better how the # relationship looks like. # Hence, our final model is: fit_final <- coxph(Surv(time, status) ~ sex * ns(ph.karno, 3) + age, data = lung)
# The summary of the final model gives: summary(fit_final)
# From the table of coefficients we cannot interpret the ones which contain # nonlinear terms. But we can interpret the rest, e.g., the coefficient of age # To depict how the sex and ph.karno affect the risk of death, we use an effect plot: ND <- with(lung, expand.grid(sex = levels(sex), ph.karno = seq(60, 100, length = 25), age = median(age) )) prs <- predict(fit_final, newdata = ND, type = "lp", se.fit = TRUE) ND$pred <- prs[[1]] ND$se <- prs[[2]] ND$lo <- ND$pred - 1.96 * ND$se ND$up <- ND$pred + 1.96 * ND$se xyplot(exp(pred) + exp(lo) + exp(up) ~ ph.karno | sex, data = ND, type = "l", lty = c(1, 2, 2), lwd = 2, col = "black", abline = list(h = 1, lty = 2, lwd = 2, col = "red"), xlab = "PH Karno", ylab = "Hazard Ratio")
We are interested in estimating survival probabilities for males and females with the median age, and with the average Karnofsky score.
a. Which are the median survival times and their 95% confidence limits for males and females with median age and average Karnofsky score?
b. Plot the corresponding survival curves.
c. What are the corresponding survival probabilities for 200, 400, 600 and 800 days?
fit_final <- coxph(Surv(time, status) ~ sex * ns(ph.karno, 3) + age, data = lung)
# First we need to define the dataset that contains the information of the type of # patients for which we want to calculate the survival probabilities. # Create this dataset using as source (i.e., to calculate the median age and mean # Karnofsky score) the original 'lung' dataset. # Use expand.grid() - check the code in slides 292-293, and # Section 5.1, Survival Analysis in R Companion
# The code is (this code only creates the dataset and does not produce any output): DD <- with(lung, expand.grid(sex = levels(sex), age = median(age), ph.karno = mean(ph.karno)))
# Calculate the survival probabilities using this dataset 'DD' and the final Cox # model 'fit_final' using the survfit() function.
# The code is: sfit <- survfit(fit_final, newdata = DD) sfit plot(sfit, col = 1:2, lwd = 2) legend("topright", levels(DD$sex), lty = 1, col = 1:2, lwd = 2, bty = "n")
# obtain survival probabilies at specific follow-up times we use the summary() # method for 'sfit', and we specify the 'times' argument. # See Section 5.1, Survival Analysis in R Companion
# The code is: summary(sfit, times = c(200, 400, 600, 800))
For the rest of the questions we consider the additive Cox PH model with sex
, age
and
ph.karno
fitted in the original lung
database (i.e., not the two databases before and
after 170 days). It is believed that the baseline hazard of death has a completely
different shape for patients with ECOG score greater than 0 compared to patients with ECOG
equal to 0, i.e., the hazard functions of the two groups is not analogous. First, from the
ph.ecog
variable that takes values from 0 to 3, construct the variable ph.ecog2
that
groups together the values 1-3. Then, fit an appropriate Cox model that takes the feature
described above into account, and then interpret the results. Name the fitted model
fit6
.
# First, we need to construct the 'ph.ecog2'. Start by setting the variable as a copy of # the original 'ph.ecog'. Then set the values of 'ph.ecog2' which are greater than 0 # equal to 1.
# The code is (this code only constructs the variable and does not produce any output): lung$ph.ecog2 <- lung$ph.ecog lung$ph.ecog2[lung$ph.ecog2 > 0] <- 1
# To account for the different baseline hazard for the levels of `ph.ecog2` # we need to stratify with this variable. # See Section 5.2, Survival Analysis in R Companion
# The code is: fit_addStr <- coxph(Surv(time, status) ~ sex + age + ph.karno + strata(ph.ecog2), data = lung) summary(fit_addStr)
The team of physicians of the North Central Cancer Treatment Group (who are responsible
for the Lung study) believe that the effects of sex
, age
and ph.karno
in the risk
of death are different for the two ECOG groups. Extend the model fit_addStr
of Question 3 accordingly,
and test whether this hypothesis is supported by the data for each of the two predictors.
lung$ph.ecog2 <- lung$ph.ecog lung$ph.ecog2[lung$ph.ecog2 > 0] <- 1 fit_addStr <- coxph(Surv(time, status) ~ sex + age + ph.karno + strata(ph.ecog2), data = lung)
# Check slides 299 & 302, and Section 5.2, Survival Analysis in R Companion # You will need to include the interaction between the stratifying factor and the three # covariates, and do the LRT to compare with the previous model.
# The code is: fit_intStr <- coxph(Surv(time, status) ~ (sex + age + ph.karno) * strata(ph.ecog2), data = lung) anova(fit_addStr, fit_intStr)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.