In drizopoulos/EP03survival: Software Tutorials for the Survival Part of the Biostatistical Methods II Course

library("learnr")
library("survival")
load("Data.RData")
knitr::opts_chunk$set(echo = FALSE)
options(tutorial.exercise.timelimit = 1200)

Quiz

The following questions test your knowledge in Chapters 1 - 3.

Question 1

In a clinical study interest lies on the survival of HIV-infected patients after seroconversion. The Kaplan-Meier estimate at year 1 equals 0.88

quiz(
  question("Which of the following statements are correct (more than one correct is possible)?",
    answer("Given that a patient is alive at year 1, the instantaneous risk of death just after year 1 is 0.88."),
    answer("For the target population and on average we expect 88% of the patients to live more than 1 year.", correct = TRUE),
    answer("The cumulative risk of death at year 1 equals 0.88."),
    answer("The estimated survival function at year 1 equals 0.88, meaning that we expect 12% of the patients to die within 1 year.", correct = TRUE),
    answer("The estimated survival function at year 1 equals 0.12, meaning that we expect 12% of the patients to live more than 1 year."),
    answer("The cumulative distribution function equals 0.12.", correct = TRUE),
    answer("The cumulative distribution function equals 0.88."),
    allow_retry = TRUE, random_answer_order = TRUE
  )
)

Question 2

A study has been designed to investigate whether a new therapy improves the survival rates of advanced cancer patients. You have at hand the survival times of the two groups of patients, namely, the patients taking the new treatment and the patients with the standard treatment.

quiz(
  question("Which of the following types of analysis would you follow to investigate if the new treatment works?",
    answer("Perform a two-sample t-test for the two groups of patients to test if the mean survival time in the new treatment group is greater than the mean survival time in the standard treatment group."),
    answer("First check if the data are normally distributed; if yes, perform a two-sample t-test, otherwise perform a two-sample Wilcoxon test to test differences in the medians between the two treatment groups."),
    answer("Perform a log-rank test to compare the survival distributions of the two groups.", correct = TRUE),
    answer("Perform a paired t-test for the two groups of patients."),
    allow_retry = TRUE, random_answer_order = TRUE
  )
)

Question 3

quiz(
  question("Which of the following types of analysis would you follow to investigate if the new treatment works?",
    answer("Perform a log-rank test to compare the survival distributions of the two groups."),
    answer("Perform a Peto and Peto Gehan-Wilcoxon test to compare the survival distributions of the two groups."),
    answer("Check graphically if the proportional hazards assumption is satisfied. If it seems to be satisfied, then use the log rank test.", correct = TRUE),
    answer("Check graphically if the proportional hazards assumption is satisfied. If it seems to be satisfied, then use the Peto and Peto Gehan-Wilcoxon test."),
    allow_retry = TRUE, random_answer_order = TRUE
  )
)

Exercises

The purpose of this practical is to illustrate how standard statistical analysis of survival data can be performed in R.

The following questions are based on the AIDS dataset. This dataset is available as the object aids.id and is already loaded in this session. From this dataset we will use the following variables:

Time: the observed time-to-death in months.
death: the event indicator; '1' denotes death and '0' censored observation.
drug: the treatment indicator with values 'ddC' and 'ddI'.
gender: the sex indicator with values 'male' and 'female'.

For the exercises below it will be useful to check the corresponding sections of the Survival Analysis in R Companion that are mentioned in the hints.

Question 1

Calculate and plot the Kaplan-Meier estimator of the survival function based on all the data. What is the median survival time and its 95% confidence interval?

# Check the example in slides 73-74, and Section 2.1, Survival Analysis in R Companion

# Calculate the Kaplan-Meier estimator and check the output
fitKM <- survfit(Surv(Time, death) ~ 1, data = aids.id)
fitKM

# Plot the Kaplan-Meier estimator
plot(fitKM)

Question 2

Calculate and plot the Breslow estimator of the survival functions for ddC and ddI, separately. Calculate also the estimates of the 50%, 60% and 70% percentiles of the survival distribution with their 95% confidence intervals. Name the Breslow estimator object fitB.

# Check the example in slides 86 & 80, and Section 2.1, Survival Analysis in R Companion

# Calculate the Breslow estimator and check the output
fitB <- survfit(Surv(Time, death) ~ drug, data = aids.id, type = "fleming-harrington")
fitB

# Plot the Breslow estimator
plot(fitB, lty = 1:2, col = 1:2)

# Use the quantile() function
quantile(fitB, 1 - c(0.5, 0.6, 0.7))

Question 3

Using the Breslow estimator fitB of the previous question, calculate the 8- and 10-month survival probability with its corresponding 95% confidence interval.

fitB <- survfit(Surv(Time, death) ~ drug, data = aids.id, type = "fleming-harrington")

# you will need to use function summary() and its argument 'times'
# Check Section 2.1, Survival Analysis in R Companion

# The code is:
summary(fitB, times = c(8, 10))

Question 4

Compare with the log-rank Peto & Peto modified Gehan-Wilcoxon tests if the survival curves for the two treatment groups differ statistically significantly. Before doing the analysis, which of the two tests you expect to yield the smaller p-value and why?

# Check the example in slides 101 & 109, and Section 2.2, Survival Analysis in R Companion

# log-rank test
survdiff(Surv(Time, death) ~ drug, data = aids.id)

# Gehan-Wilcoxon test
survdiff(Surv(Time, death) ~ drug, data = aids.id, rho = 1)

Question 5

Do the same for gender, i.e., calculate the Kaplan-Meier estimator of the survival functions for males and females, and compare the results from the log-rank and Peto & Peto modified Gehan-Wilcoxon tests. Which test you should trust more in this case and why?

# first calculate the Kaplan-Meier estimator and do the graph

# The code is:
fitKM_gender <- survfit(Surv(Time, death) ~ gender, data = aids.id)
fitKM_gender

plot(fitKM_gender)

# Use survdiff() as in Question 4

# log-rank test
survdiff(Surv(Time, death) ~ gender, data = aids.id)

# Gehan-Wilcoxon test
survdiff(Surv(Time, death) ~ gender, data = aids.id, rho = 1)

drizopoulos/EP03survival documentation built on Oct. 12, 2020, 10:45 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

drizopoulos/EP03survival
Software Tutorials for the Survival Part of the Biostatistical Methods II Course

In drizopoulos/EP03survival: Software Tutorials for the Survival Part of the Biostatistical Methods II Course

Quiz

Question 1

Question 2

Question 3

Exercises

Question 1

Question 2

Question 3

Question 4

Question 5

R Package Documentation

Browse R Packages

We want your feedback!

drizopoulos/EP03survival Software Tutorials for the Survival Part of the Biostatistical Methods II Course

In drizopoulos/EP03survival: Software Tutorials for the Survival Part of the Biostatistical Methods II Course

Quiz

Question 1

Question 2

Question 3

Exercises

Question 1

Question 2

Question 3

Question 4

Question 5

R Package Documentation

Browse R Packages

We want your feedback!

drizopoulos/EP03survival
Software Tutorials for the Survival Part of the Biostatistical Methods II Course