knitr::opts_chunk$set(fig.width=7)
Data originally downloaded from the Statistical Software Repository at the University of Massassachusetts Amherst (Dataset is labeled "AIDS clinical trial"). We reprocessed the data to make it more readable (see file actg_reprocess.R).
The data come from a population of HIV-positive patients in a double-blind, placebo-controlled trial that compared four
drug regimens. (Hammer et al., 1997). For the purposes of this demonstration we will compare patients on a drug regimen with indinavir (IDV) to those on a regimen without it. The outcome of interest is time to AIDS defining event or death (called AIDSorDeath
in the dataset we use). Because efficacy results met a pre-specified level of significance at an interim analysis, the trial was stopped early.
library(QSurvival) library(mgcv) library(ggplot2) library(dplyr) adata = readRDS("AIDSdata/AIDSdata.rds") parallelCluster <- NULL ## can't run paralle code in vignettes #if(requireNamespace("parallel",quietly=TRUE)) { # parallelCluster <- parallel::makeCluster(4) #}
Note that the model shows a decrease in risk for patients on a regiment that includes IDV, a decreased risk for patients with higher CD4 count at the beginning of the study (as expected), an increase in risk for patients who exhibited more symptoms at the beginning of the study, and a slight increase for older patients.
# define the variables of interest. vars = setdiff(colnames(adata), c("time", "id", "time_to_death", "AIDSorDeath", "death", "treatment_gp")) # build the training set from the original data. adataX = buildQuasiObsForTraining(adata, adata$time, ifelse(adata$AIDSorDeath==1, adata$time, NA), "rowid", "timestep", "eventhappened", parallelCluster=parallelCluster) # fit a model using GAM with time splined. formula = paste("eventhappened ~ s(timestep) +", paste(vars, collapse="+")) model = gam(as.formula(formula), data=adataX, family=binomial) summary(model)
When we compare the estimated survival curves of the two treatment groups (treatment_idv = TRUE/FALSE
), we see that the patients with a regiment including IDV tend to survive longer compared to those not treated with IDV. However note that we cannot well estimate the true survival curves or patient lifetimes, since the study was short (and truncated) and most patients survived past the time of the study.
# re-prepare adata for evaluation adataT = buildQuasiObsForComparison(adata, 365, adata$time, ifelse(adata$AIDSorDeath==1, adata$time, NA), "rowid", "timestep", "eventhappened", parallelCluster=parallelCluster) # get the hazard functions adataT$hazard = as.numeric(predict(model, newdata=adataT, type="response")) # build the individual level hazard and survival curves summaries = summarizeHazard(adataT, "rowid", "timestep", "hazard", survivalColumnNames = "survival", deathIntensityColumnNames='deathIntensity', parallelCluster=parallelCluster) # frame of curves curves = summaries$details[, c("rowid", "timestep", "survival", "hazard","deathIntensity")] # join the treatment information to the curves curves = dplyr::inner_join(curves, adataT[, c('rowid', 'timestep', 'treatment_idv')], by=c("rowid", "timestep")) # get the overall curves, grouped by treatment condition meanCurves = dplyr::summarize(dplyr::group_by(curves, timestep, treatment_idv), hazard = mean(hazard), survival=mean(survival), deathIntensity = mean(deathIntensity)) # compare survival curves ggplot(meanCurves, aes(x=timestep, y=survival, color=treatment_idv)) + geom_line()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.