The goal of PiecewiseChangepoint is to estimate the number and locations of change-points in piecewise exponential models.
You can install the released version of PiecewiseChangepoint from GitHub with:
devtools::install_github("Anon19820/PiecewiseChangepoint")
In order to run some of the functions JAGS and Stan are required along with RTools??
First we load the package and simulate some piecewise exponential data.
library("survminer", lib.loc = "~/R-packages/") library("survival") library("xlsx") library("dplyr") library("flexsurv") library("PiecewiseChangepoint", lib.loc = "~/R-packages/") pathway <- "~/PhD/KM_Piecewise_Major_Review_final/Readme Update/" #Need Rtools for Stan #Need Jags
library("PiecewiseChangepoint") ## simulated example set.seed(123) n_obs =20 n_events_req=20 max_time = 24 # months rate = c(0.75,0.25)/12 # we want to report on months t_change =12 # change-point at 12 months df <- gen_piece_df(n_obs = n_obs,n_events_req = n_events_req, num.breaks = length(t_change),rate = rate , t_change = t_change, max_time = max_time)
#pathway <- "C:/Users/phili/OneDrive/PhD/R packages/PiecewiseChangepoint/PiecewiseChangepoint/" #Collapsing_Model<- readRDS(file = paste0(pathway,"Examples/Collapsing_Model.rds")) df<- readRDS(file = paste0("Examples/df.rds"))
We see the output of this dataframe below:
head(df)
For this simulated dataset; time_event represents the time the event would occur at in the absence of censoring, while time is minimum of the censoring time and the event time. status is an indicator variable if the event occurred at the corresponding time or if it was censored. Plotting this survival function we see a potential change in the hazard at around year 1.
fit <- survfit(Surv(time, status) ~ 1, data = df) # Drawing curves ggsurvplot(fit, palette = "#2E9FDF")
ggsave("Examples/Surv_plot.png", height = 6, width = 10)
As noted in [@Bagust.2014], constant hazards are linear with respect to the cumulative hazard function, therefore, the change in hazards at approximately 12 months can be seen more clearly in this plot.
ggsurvplot(fit, palette = "#2E9FDF", fun = "cumhaz")
ggsave("Examples/CumHaz_plot.png", height = 6, width = 10)
Next we fit the model noting that only the time and status columns are required. The timescale argument changes the prior for the hazards $\lambda$ so that it is appropriate for the timescale. For example if the timescale is years then the a vague prior centered around 1 is appropriate (i.e. $36\%$ of population having the event each year), while if the timescale is in months the equivalent prior should have an expected value of 1/12 (and days 1/365).
Collapsing_Model <- collapsing.model(df, n.iter = 20750, burn_in = 750, n.chains = 2, timescale = "months")
As we would expect the one change-point model has the highest posterior probability.
print(Collapsing_Model)
Collapsing_Model<- readRDS(file = paste0("Examples/Collapsing_Model.rds"))
print(Collapsing_Model)
Simulations from the posterior distribution for the change-point locations and associated hazards can be extracted from the returned object (highlighted below).
Collapsing_Model$changepoint Collapsing_Model$lambda
Once we are satisfied that there is good mixing and that we have run the model for long enough (20,000 simulations over 2 chains should be more than enough), we may want to look at a plot of the survivor function. In health economics we are typically interested in long term survival of our parametric models. In this situation we want a plot of the first 5 years which we can do using the max_predict argument (in this case 60 months). The red lines show the individual posterior simulations and are a natural representation of the parameter uncertainty.
plot(Collapsing_Model, max_predict = 60, chng.num = 1)+xlab("Time (Months)")
ggsave("Examples/Surv-1-chng.png", height = 6, width = 10)
Similarly we may also want to look at the hazard function. In this situation we only present the hazard up to the maximum time observed in the data. This is because by definition the hazard from the final interval will be the one which is extrapolated throughout the time horizon.
plot(Collapsing_Model, type = "hazard")+xlab("Time (Months)")+ylab("Hazards")+ylim(c(0,.1))
By default the plot methods described above use all the posterior simulations. If for example, we were only interested in the 2 change-point model, we can specify this using the chng.num argument. The green points indicate the mean location of the change-points. When plotting "all" of the simulations there is no sensible mean location of the change-points as there are different numbers of change-points.
plot(Collapsing_Model, max_predict = 60, chng.num = 2)+xlab("Time (Months)")
ggsave("Examples/Surv-2-chng.png", height = 6, width = 10)
In practical health economic modelling, we require evaluation of the survival function. Assuming time $t$ is within the $j^{\text{th}}$ interval we have calculate the cumulative hazard for this interval as $\lambda_j (t - \tau_{j-1})$ with $\tau_{j-1}$ being the ${j-1}^{\text{th}}$ change-point and $\lambda_j$ the $j^{\text{th}}$ hazard. We also require the cumulative hazard for all the previous intervals which is $\sum_{g=1}^{j-1} \lambda_g(\tau_g - \tau_{g-1})$. The survival probability is the exponential of the negative of the total cumulative hazard $S(t) = \exp(-H(t))$ and is written fully as:
$$S(t) = \exp\bigg{- \bigg[\lambda_j (t - \tau_{j-1}) + \sum_{g=1}^{j-1} \lambda_g(\tau_g - \tau_{g-1}) \bigg] \bigg}.$$
It is programmatically straightforward to calculate this and the the function get_Surv
evaluates the survival at a vector of user specified times. The user can specific an additional argument chng.num
if they require survival probabilities from a particular change-point number.
St_all <- get_Surv(Collapsing_Model, time = c(0:10)) St_all <- get_Surv(Collapsing_Model, time = c(0:10), chng.num = 1)
Given that the primary
In health economics we are typically interested in picking between one of a number of alternative parametric models, although it is also possible to combine all models using model averaging [@Jackson.2010]. Model fit statistics can provide an assessment of fit to the observed data, although, they do not guarantee the best fitting model will be appropriate for extrapolation. Nevertheless, we can compare our fitted model with 6 commonly used parametric models along with Royston-Parmar spline models. We fit the models using the JAGS [@Plummer.2003] and Stan [@RStan.2023] and compare the model fit using Widely Applicable Information Criterion (WAIC) [@Watanabe.2010].
Including General Population Mortality (GPM) is required to ensure that the extrapolated hazards are consistent with the increasing hazards associated with advanced ageing. Adjustments for GPM is typically done within the cost-effectiveness model, however, we can include them directly at the analysis stage so that we see their impact on the extrapolated survival.
In this example we consider GPM from a UK data source which provides mortality rates, defined as "the probability of that a person aged exactly $x$ will die before reaching $x+1$. Therefore, this data source provides the conditional probability of death within a year at each age.
Assuming our population is $50%$ male and female and the age at baseline is 55 years we have the following conditional probabilities of death at each age:
age_baseline_example <- 55 prop_male <- 0.5 time_horizon <- 100 Conditional_Death_df <- read.xlsx(paste0(pathway, "Examples/Conditional_Death_UK.xlsx"), 1) %>% filter(age >=age_baseline_example) head(Conditional_Death_df)
Our timescale is months and we need to convert this annual probability to a monthly rate which is done using the following formula (assuming a constant rate of mortality) [@Fleurence.2007]:
$$r = \frac{1}{t}\ln(1-p).$$ Because there are 12 months in a year $t = 12$ and $p$ is the specific (in our case annual) probability of death. With the below R code we now have the monthly rate of death for ages 55 (our assumed starting age of the cohort) up to 100 years of age, adjusted for distribution of males and females.
time_factor <- 12 df_temp <- Conditional_Death_df df_temp[, "mix_prob"] <- df_temp[,2]*prop_male + df_temp[,3]*(1-prop_male) df_temp <- df_temp %>% filter(age >= age_baseline_example & age <= time_horizon) df_temp$mix_haz <- -log(1-df_temp$mix_prob)/time_factor gmp_haz_vec_example = rep(df_temp$mix_haz,each = time_factor) #We now have the hazard at each timepoint gmp_haz_df_example <- data.frame(time = 1:length(gmp_haz_vec_example), hazard = gmp_haz_vec_example)
Within the compare.surv.mods
function the cumulative hazard of death (associated with GPM) and cumulative hazard of an event (from the parametric model) is added to obtain the overall cumulative hazard $H(t)$. The cumulative hazard is the sum (in the case of discrete hazards as above) of the individual hazards and the integral of the parametric hazards. As noted in the previous section survival probabilities are obtained through the relation $S(t) = \exp(-H(t))$. By default the compare.surv.mods
function only implements GPM hazards after follow-up as we observe survival from all causes up until then (although GPM hazards can be added from start of follow-up by using the gpm_post_data = FALSE
).
We see in the plot below that including the GPM hazard ensures that the extrapolated hazard exhibits the characteristic increasing hazards associated with ageing.
knitr::include_graphics(paste0("Examples/GPM hazards.png"))
gmp_haz_df_example_plt <- gmp_haz_df_example %>% filter(time > 24) #Extrapolated Hazard #Find final constant hazard used for extrapolation extrapolated_haz <- colMeans(Collapsing_Model$lambda[apply(Collapsing_Model$lambda,1, function(x){length(na.omit(x))==2}),])[2] # png(paste0("~/PhD/KM_Piecewise_Major_Review_final/Readme Update/Examples/","GPM hazards.png"), # width = 10, height = 6, units = 'in', res = 300) plot(gmp_haz_df_example_plt$time/12 + age_baseline_example, y = gmp_haz_df_example_plt$hazard +extrapolated_haz, col = "red", ylim= c(0, max(gmp_haz_df_example_plt$hazard +extrapolated_haz)*1.1), type = "l", xlab = "Age", ylab = "(Monthly) Hazard") lines(gmp_haz_df_example_plt$time/12 + age_baseline_example, y = gmp_haz_df_example_plt$hazard, col = "blue") #GMP legend("topleft", legend=c("Extrapolated hazard - Constant Disease-specific + GPM hazard", "GPM hazard"), col=c("red", "blue"), lty=1:2, cex=0.8) # dev.off()
The fitting of other parametric models is accomplished by the compare.surv.mods
and general population mortality is adjusted for by including a gmp_haz_df
as described above.
Fitted models include:
Model fit to the observed data and a plot of the extrapolated survival are available from within the mod_comp
object along with the posterior samples from all of the fitted models.
#This can take a number of minutes set.seed(123) mod_comp <- compare.surv.mods(Collapsing_Model, max_predict = 100, #100 months n.iter.jags = 5000, #Run JAGS/Stan for 5000 samples n.thin.jags = 1, n.burnin.jags = 500, chng.num = 1, #Using results from 1 change-point PEM gmp_haz_df =gmp_haz_df_example) #GPM dataset mod_comp$mod.comp[,c(1,3)]
mod_comp<- readRDS(file = paste0("Examples/mod_comp2.rds"))
#Returns a dataframe with the model fit results mod_comp$mod.comp[,c(1,3)] %>% arrange(WAIC) mod_comp$plot_Surv_all
knitr::include_graphics(paste0("Examples/SimSurv_data.png"))
Information used for the identification of the relevant Technology Appraisals (TAs) from the review by [@Gorrod.2019] along with the relevant information extracted from them can be found in the Excel file located here.
The first worksheet of this file lists the TAs investigated (Figure \@ref(fig:Gorrod-1)).
knitr::include_graphics(paste0("Examples/Gorrod Overview.png"))
For each of the TAs listed in the first worksheet a separate worksheet provides further information relating to whether or not it used the Bagust and Beale (B&B) approach along with the relevant location in the TA and an associated screengrab of the relevant information. In situations where B&B was confirmed to have been used, further information including the Kaplan-Meier survival curves and location of the assumed change-point are recorded along with the respective locations within the TA are recorded. Kaplan-Meier curves for any survival data made available after the original TA is along presented along with a link to the relevant data-source. For an example of some of the data extracted from TA268 [@TA268].
knitr::include_graphics(paste0("Examples/TA268 Data-source.png"))
All results from the manuscript can be replicated using the folder located here.
The folder contains a R script titled Digitizing_R_code_Final_Share.R
which will produce relevant plots and tables in the folder named pub-plots-tabs
, using the R functions described previously.
A number of sub-folders are also contained within the main folder and provide pseudo-patient data created from the Kaplan-Meier curves presented in the TAs (and in publications providing later datacuts). These are named as follows TA_Treatment_Outcome_Datacut
and are read in and digitized
from the survival.txt and nrisk.txt files in these folders.
knitr::include_graphics(paste0("Examples/File Structure.png"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.