In drizopoulos/EP03survival: Software Tutorials for the Survival Part of the Biostatistical Methods II Course

library("learnr")
library("survival")
library("lattice")
lung$sex <- factor(lung$sex, labels = c("male", "female"))
lung <- with(lung, lung[complete.cases(time, status, sex, age, ph.karno), ])
knitr::opts_chunk$set(echo = FALSE)
options(tutorial.exercise.timelimit = 1200)

Quiz

The following questions test your knowledge on the framework of accelerated failure time models presented in Chapter 4.

Question 1

quiz(
  question("In AFT models the regression parameters in their original scale (i.e., no transformation) quantify changes for which quantity?",
    answer("The log failure time."),
    answer("The failure time."),
    answer("The expected log failure time.", correct = TRUE),
    answer("The expected failure time."),
    answer("The log hazard function."),
    answer("The hazard function."),
    answer("None of the above."),
    allow_retry = TRUE, random_answer_order = FALSE
  )
)

Question 2

quiz(
  question("With regard to the distribution of the error terms in AFT models, which of the following statements is correct?",
    answer("The choice of the distribution of the error terms will not affect the coefficients' estimates."),
    answer("The choice of the distribution of the error terms minimally affects the coefficients' estimates."),
    answer("Without fitting the corresponding AFT models, we cannot tell how much the choice of the distribution will affect the coefficients' estimates.", correct = TRUE),
    answer("None of the above."),
    allow_retry = TRUE, random_answer_order = FALSE
  )
)

Question 3

quiz(
  question("Which of the following statements is correct with regard to effect plots?",
    answer("Effect plots are most useful for additive models without interaction terms."),
    answer("Effect plots are most useful for models with interaction terms.", correct = TRUE),
    answer("Effect plots are most useful for models without categorical covariates."),
    answer("Effect plots are most useful for models without continuous covariates."),
    answer("None of the above."),
    allow_retry = TRUE, random_answer_order = TRUE
  )
)

Question 4

quiz(
  question("Which of the following are requirements in order to compare two AFT models with a likelihood ratio test (more than one correct possible)?",
    answer("The models need to be nested (one is a special case of the other).", correct = TRUE),
    answer("The models need to be fitted with the same distribution for the error terms.", correct = TRUE),
    answer("The models need to be fitted on the same dataset.", correct = TRUE),
    answer("The models need to contain only categorical covariates."),
    answer("The models need to contain only continuous covariates."),
    answer("The models need to contain both categorical and continuous covariates."),
    allow_retry = TRUE, random_answer_order = TRUE
  )
)

Question 5

In a study investigating potential risk factors for the expected survival time of prostate cancer patients, a team of researchers has fitted an accelerated failure time model assuming a normal distribution for the error terms. The team members have some conflicting ideas on how to check the normality assumption, and they have come to you for advice. Below you find some of their ideas to check this assumption.

quiz(
  question("Which one is correct?",
    answer("Check the shape of the histogram of the residuals."),
    answer("Compute the Shapiro-Wilk test for the residuals."),
    answer("Check whether the Kaplan-Meier estimate of the residuals agrees with the survival function of the normal distribution.", correct = TRUE),
    answer("Use the Q-Q plot of the residuals."),
    allow_retry = TRUE, random_answer_order = TRUE
  )
)

Exercises

The purpose of these exercises is to illustrate how Accelerated Failure Time model can be fitted in R.

The following questions are based on the Lung dataset. This dataset is available as the object lung from package survival, which is already loaded in this session. From this data set we will use the following variables:

time: the observed time-to-death in days.
status: the event indicator; '1' denotes censored and '2' denotes death.
age: the age in years.
ph.karno: Karnofsky performance score rated by the physician.
sex: the sex indicator with values 'male' and 'female'.

For the exercises below it will be useful to check the corresponding sections of the Survival Analysis in R Companion that are given in the hints.

If you decide to work directly in R and not in this online tutorial, then you need to first run the following lines of code:

library("survival")
library("lattice")
data("lung", package = "survival")
lung$sex <- factor(lung$sex, labels = c("male", "female"))
lung <- with(lung, lung[complete.cases(time, status, sex, age, ph.karno), ])

Question 1

Our initial hypothesis is that the time-to-death is affected by sex, age and ph.karno. In addition, we also believe that the effects of age and ph.karno on the expected failure time are not the same for males and females. Transform this initial hypothesis into a suitable AFT model. For the error terms assume the extreme value distribution, which as we have seen, corresponds to the Weibull distribution for the time-to-death. Name the fitted model fit1.

# Check the example in slide 133, and Section 3.1, Survival Analysis in R Companion

# Fit the AFT model
fit1 <- survreg(Surv(time, status) ~ sex * (age + ph.karno), data = lung,
                dist = "weibull")
summary(fit1)

Question 2

We would like to test whether some aspects of our initial hypothesis are supported by the data. In particular, we are interested in testing: 1. whether sex has at all an effect in the time-to-death, and 2. whether the effects of age and ph.karno are equal for males and females.

Based on the results of these two hypotheses, simplify the model appropriately. Name the final fitted model fit3.

fit1 <- survreg(Surv(time, status) ~ sex * (age + ph.karno), data = lung,
                dist = "weibull")

# Check the example in slide 160, and Section 3.3, Survival Analysis in R Companion

# We want to test the overall effect of sex => From the previous model remove sex;
# then use the anova() function to do the LRT.

# The code is
fit2 <- survreg(Surv(time, status) ~ age + ph.karno, data = lung)

anova(fit2, fit1)

# For the 2nd question we want to test the interaction terms => Return in the full 
# model (i.e., the model under the alternative hypothesis) and remove the interactions;

# then use the anova() function to compute the corresponding LRT.

# The code is
fit3 <- survreg(Surv(time, status) ~ sex + age + ph.karno, data = lung)

anova(fit3, fit1)

Question 3

For the final model fit3 obtained in Question 2 create an effect plot depicting how the average failure time changes with increasing values of ph.karno, for the median age.

fit3 <- survreg(Surv(time, status) ~ sex + age + ph.karno, data = lung)

# Check the example in slides 145-148, and Section 3.2, Survival Analysis in R Companion

# First calculate what is the median age, and what is the observed range of 'ph.karno' in
# our data (for the latter use function range())

# the code is
with(lung, median(age))

with(lung, range(ph.karno))

# Hence, the median age is 63, and the observed range of ph.karno is from 50 to 100.

# We then create the database based on which predictions will be made. We need three 
# variables/columns: 'sex' (both 'male' and 'female'), 'age' (equal to 63), and 
# 'ph.karno' increase from 50 to 100.

# Use function expand.grid()

# The code is (this code just creates the dataset and does not create any output)
ND <- expand.grid(sex = c("male", "female"), age = 63, ph.karno = seq(50, 100, length.out = 50))

# Calculate the predictions using the same code as in slide 147, using model 'fit3'

# The code is
prs <- predict(fit3, ND, se.fit = TRUE, type = "lp")
ND$pred <- prs[[1]]
ND$se <- prs[[2]]
ND$lo <- exp(ND$pred - 1.96 * ND$se)
ND$up <- exp(ND$pred + 1.96 * ND$se)
ND$pred <- exp(ND$pred)

# The first lines of ND are
head(ND)

# Produce the plot using the same code as in slide 148.
# Note, you will need to adapt a bit this code, namely,
# you will need to use 'ph.karno' instead of 'age', and 'sex' instead of 'drug'

# The code is
xyplot(pred + lo + up ~ ph.karno | sex, data = ND, type = "l", 
    lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 3, 
    xlab = "Karnofsky Performance Score", ylab = "Survival Time")

Question 4

Check whether the assumption of the extreme value distribution for the error terms is violated for your final model fit3 using the AFT residuals. What is your conclusion?

fit3 <- survreg(Surv(time, status) ~ sex + age + ph.karno, data = lung)

# Check the example in slides 168-169, and Section 3.4, Survival Analysis in R Companion

# For model 'fit3' that we have selected in the previous question, 
# first extract the fitted values, then calculate the residuals, and then 
# calculate the Kaplan-Meier estimator of these residuals.

# The code is (this code just calculates the residuals and does not create any output):
fits <- fit3$linear.predictors
lung$resids <- (log(fit3$y[, 1]) - fits) / fit3$scale
resKM <- survfit(Surv(resids, status) ~ 1, data = lung)

# Then produce the plot of the Kaplan-Meier and add the red line by 
# using the code as in slide 169.

plot(resKM, mark.time = FALSE, xlab = "AFT Residuals", 
    ylab = "Survival Probability", main = "Lung Data Set")
xx <- seq(min(lung$resids), max(lung$resids), length.out = 35)
yy <- exp(- exp(xx))
lines(xx, yy, col = "red", lwd = 2)