In iain-t-bennett-roche/flexsurvPlus: Provides Functions to Perform Survival Analysis for Economic models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 6,
  message = FALSE
)

Introduction

Parametric survival models are often the preferred method of extrapolating survival data for use in economic models. The National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) technical support document (TSD) 14 recommends that the Exponential, Weibull, Gompertz, log-logistic, log normal and Generalized Gamma parametric models should all be considered.[@latimer2011nice] More recently, NICE also discusses more flexible models in NICE DSU TSD 21, however, more these models are not in the scope of this package.[@rutherford2020nice] The Canadian Agency for Drugs and Technologies in Health (CADTH) additionally specifies that the Gamma distribution must also be considered. This document therefore details the characteristics of each of these distributions and demonstrates how the parameters from each distribution, outputted using the flexsurvPlus package, can be implemented within an economic model.[@cadth2020] The Generalized F distribution is not commonly used, however it has been included in this package in case it is required.

The flexsurvPlus package allows the inclusion of a treatment effect in the following four ways:

Separate models - Models fitted to each treatment arm separately
Independent shape models - Models fitted to both/all treatment arms including a treatment covariate to model the effect of treatment on both the scale and shape parameter(s) of the distribution
Common shape models - Models fitted to both/all treatment arms including a treatment covariate to model the effect of treatment on the scale parameter of the distribution. The shape parameter(s) of the distribution is common across treatments which reflects an assumption of proportional hazards or accelerated failure time between treatments depending on the distribution
One arm models - Models fitted to the entire dataset (no treatment strata)

This document details how to use the flexsurvPlus package to perform these models. A separate vignette; "Parametric survival analysis using the flexsurvPlus package: understanding the theory" details the theory behind the models.

Set up packages and data

Install packages

The following packages are required to run this example:

# Libraries
library(flexsurvPlus)
library(tibble)
library(dplyr)
library(boot)
library(ggplot2)

# Non-scientific notation
options(scipen=999) 

# Colours for plots
blue = rgb(69, 185, 209, max=255)
pink = rgb(211,78,147,max=255)
Dyellow = rgb(214, 200, 16, max=255)
orange<-rgb(247,139,21,max=255)

Read in the data

To perform survival analyses, patient level data is required for the survival endpoints.

This example uses a standard simulated data set (adtte). There is no standard naming that is needed for this package however, there are some set variables that are needed:

Time - a numeric variable
Event - a binary variable (event=1, censor=0)
Treatment - a character variable with the name of the intervention treatment

The data must be in "wide" format such that there is one row per patient and columns for each endpoint separately. In this example, we analyze progression-free survival (PFS).

adtte <- sim_adtte(seed = 2020, rho = 0.6)
head(adtte)

# subset PFS data and rename
PFS_data <- adtte %>%
  filter(PARAMCD=="PFS") %>%
  transmute(USUBJID,
            ARMCD,
            PFS_days = AVAL,
            PFS_event = 1- CNSR
  )
head(PFS_data)

Exploratory analysis

Before performing any statistical analysis, it is important to explore the data. Most importantly is a Kaplan-Meier plot. A simple Kaplan-Meier plot has been produced here:

# Create survfit object
km.est.PFS <- survfit(Surv(PFS_days, PFS_event) ~ ARMCD, 
                      data = PFS_data, 
                      conf.type = 'plain')

# Plot Kaplan-Meier
plot(km.est.PFS, 
     col = c(blue, pink), # plot colours
     lty = c(1:2), # line type
     xlab = "Time (Days)", # x-axis label
     ylab = "Progression-free survival", # y-axis label
     xlim = c(0, 800)) 

legend(x = 500, 
       y = .9, 
       legend = c("Arm A", "Arm B"), 
       lty = c(1:2), 
       col = c(blue, pink))

Fitting the models

The runPSM function fits parametric survival models for multiple distributions using the flexsurv package, manipulates the flexsurv objects to get the parameter estimates and AIC and BIC value (using the flexsurvPlus function get_params) and rearranges the parameter estimates such that they can easily be output to excel to calculate survival for both the intervention and reference treatment in an economic model.

These functions can be used to estimate 3 types of model:

Separate models - Models fitted to each treatment arm separately
- These models are fitted using the runPSM function using the model.type="Separate" argument.
Independent shape models - Models fitted to both/all treatment arms including a treatment covariate to model the effect of treatment on both the scale and shape parameter(s) of the distribution.
- They have been fit using the runPSM function using the model.type="Independent shape" argument.
Common shape models - Models fitted to both/all treatment arms including a treatment covariate to model the effect of treatment on the scale parameter of the distribution. The shape parameter(s) of the distribution is common across treatments which reflects an assumption of proportional hazards or accelerated failure time between treatments depending on the distribution
- These models have been fit using the runPSM function using the model.type="Common shape" argument.
One arm models - Models fitted to the entire dataset (no treatment strata)
- These models are fitted using the runPSM function using the model.type="One arm" argument.

The inputs to the runPSM function are:

data - A data frame containing individual patient data for the relevant time to event outcomes
time_var - Name of time variable in 'data'. Variable must be numerical and >0.
event_var - Name of event variable in 'data'. Variable must be numerical and contain 1's to indicate an event and 0 to indicate a censor.
model.type - Character vector indicating the types of model formula provided. Permitted values are 'Common shape', 'Independent shape' or 'Separate' as per the models explained above.
distr - Each type of model can be fitted with multiple distributions. The distributions available for this package are:
- Exponential ('exp')
- Weibull ('weibull')
- Gompertz ('gompertz')
- Log-normal ('lnorm')
- Log-logistic ('llogis')
- Generalized gamma ('gengamma')
- Gamma ('gamma')
- Generalized F ('genf')
strata_var - Name of stratification variable in "data". This is usually the treatment variable and must be categorical. Not required when model.type='One arm'.
int_name - Character to indicate the name of the treatment of interest, must be a level of the "strata_var" column in "data", used for labelling the parameters.
ref_name - Character to indicate the name of the reference treatment, must be a level of the "strata_var" column in "data", used for labelling the parameters. Not required when model.type='One arm'.

More information about each function can be used by running the code ?runPSM.

Example code to fit all two arm models is presented below.

psm_PFS_all <- runPSM(data=PFS_data,
                     time_var="PFS_days",
                     event_var="PFS_event",
                     model.type= c("Common shape", 
                                   "Independent shape", 
                                   "Separate"),
                     distr = c('exp',
                               'weibull',
                               'gompertz',
                               'lnorm',
                               'llogis',
                               'gengamma',
                               'gamma',
                               'genf'),
                     strata_var = "ARMCD",
                     int_name="A",
                     ref_name = "B")
psm_PFS_all

Estimate survival from the models and plot the curves

Survival at a given time, t, is estimated as follows:

$$ S(t) = P({T>t}) = 1 - F(t) $$

Where F(t) is the cumulative distribution function.

To cross check survival estimates in Excel models, the following functions in R can be used to estimate the cumulative distribution function at given time points for each distribution explored in this package (the estimates from the cumulative distribution function can then be subtracted from 1 to estimate the survival probability):

Exponential: pexp()
Weibull: pweibull()
Gompertz: pgompertz()
Log-normal: plnorm()
Log-logistic: pllogis()
Generalized gamma: pgengamma()
Gamma: pgamma()
Generalized F: pgenf()

The parameters outputted from each of the fitted models are used as inputs to these functions. The code below gives some examples.

# Landmark survival

# vector of times to estimate survival (days)
landmark_times <- c(0, 100, 200, 300)


# Example 1: intervention arm, Weibull distribution, common shape model
surv_comshp_weibull_int <- 1 - pweibull(landmark_times,
                                        shape = psm_PFS_all$parameters_vector["comshp.weibull.shape.int"],
                                        scale = psm_PFS_all$parameters_vector["comshp.weibull.scale.int"])
surv_comshp_weibull_int


# Example 2: intervention arm, log-normal distribution, separate model
surv_sep_lnorm_int <- 1 - plnorm(landmark_times,
                                 meanlog = psm_PFS_all$parameters_vector["sep.lnorm.meanlog.int"],
                                 sdlog = psm_PFS_all$parameters_vector["sep.lnorm.sdlog.int"])
surv_sep_lnorm_int


# Example 3: reference arm, Generalized Gamma distribution, independent shape model
surv_indshp_gengamma_ref <- 1 - pgengamma(landmark_times,
                                          mu = psm_PFS_all$parameters_vector["indshp.gengamma.mu.ref"],
                                          sigma = psm_PFS_all$parameters_vector["indshp.gengamma.sigma.ref"],
                                          Q = psm_PFS_all$parameters_vector["indshp.gengamma.Q.ref"])
surv_indshp_gengamma_ref

To simplify these actions a helper function is included in the flexsurvPlus package that will extract these values directly. This will calculate the same values as above.

# repeat the prior example for landmark times

landmarks_df <- summaryPSM(x = psm_PFS_all,
                        types = "survival",
                        t = landmark_times
                        )

landmarks_df %>%
  filter(Model == "Common shape", Dist == "Weibull")

The same functions can be used to generate the data required to plot the survival curves, overlaid on top of the KM plot. The time argument should reflect how long you want to the extrapolate for and the unit of time is the same as the input data (in this example, days).

# Plot the common shape models (Weibull distribution) with the Kaplan-Meier

# vector of times to estimate survival (days)
times <- c(seq(from = 0, to = 1000, by = 0.1))


# Survival probabilities: intervention arm, Weibull distribution, common shape model
surv_comshp_weibull_int <- 1 - pweibull(times,
                                  shape = psm_PFS_all$parameters_vector["comshp.weibull.shape.int"],
                                  scale = psm_PFS_all$parameters_vector["comshp.weibull.scale.int"])

# Survival probabilities: reference arm, Weibull distribution, common shape model
surv_comshp_weibull_ref <- 1 - pweibull(times,
                                  shape = psm_PFS_all$parameters_vector["comshp.weibull.shape.ref"],
                                  scale = psm_PFS_all$parameters_vector["comshp.weibull.scale.ref"])

# Create two data frames that include the survival probablaities and times
surv_comshp_weibull_int_times <- data.frame(Time = times,
                                            Surv = surv_comshp_weibull_int,
                                            Trt = "Intervention")
surv_comshp_weibull_ref_times <- data.frame(Time = times,
                                            Surv = surv_comshp_weibull_ref,
                                            Trt = "Reference")

# Plot Kaplan-Meier
plot(km.est.PFS, 
     col = c(blue, pink), # plot colours
     lty = c(1:2), # line type
     xlab = "Time (Days)", # x-axis label
     ylab = "Progression-free survival", # y-axis label
     xlim = c(0, 1000)) 

# Add legend
legend(x = 500, 
       y = .9, 
       legend = c("Arm A", "Arm B"), 
       lty = c(1:2), 
       col = c(blue, pink))

# Add model estimates
lines(x = surv_comshp_weibull_int_times$Time, y = surv_comshp_weibull_int_times$Surv, col = blue)
lines(x = surv_comshp_weibull_ref_times$Time, y = surv_comshp_weibull_ref_times$Surv, col = pink)

The same helper function can also be used to generate plots.

# repeat the prior example for plot data

plot_esurv_df <- summaryPSM(x = psm_PFS_all,
                            types = "survival",
                            t = times
                            )

# a similar function will estimate the values needed for KM estimates

plot_km_df <- summaryKM(data = PFS_data,
                        time_var="PFS_days",
                        event_var="PFS_event",
                        strata_var = "ARMCD",
                        int_name="A",
                        ref_name = "B",
                        types = "survival"
                        )


# can then combine these to plot

plot_esurv_df %>%
  dplyr::filter(Model == "Common shape",
                          Dist == "weibull") %>%
  dplyr::bind_rows(plot_km_df) %>%
  dplyr::filter(type == "survival", variable == "est") %>%
  dplyr::mutate(Model_Dist = paste(Model, Dist, sep = " - ")) %>%
  ggplot(aes(x = time, y = value, color = StrataName, linetype = Model_Dist)) +
  geom_step(data = function(x){dplyr::filter(x, Model == "Kaplan Meier") }) +
  geom_line(data = function(x){dplyr::filter(x, Model != "Kaplan Meier") })

Adressing uncertainty - Bootstrapping

Bootstrapping has been used to estimate the uncertainty of the parameters from the survival models. Boostrapping is used for two reasons motivated by intent of this package to support further modeling in excel. 1. To simplify and accelerate calculations in excel while maintaining correlations between parameters (as is commonly done for NMA) 2. To maintain correlations across multiple endpoints (see separate vignette for details)

Bootstrapping involves:

Sampling, with replacement, from all patients
Estimating all specified parametric survival models

This procedure is repeated multiple times to obtain a distribution of parameters. For this example, bootstrap estimates of the parameters were calculated using the boot package. An argument for the boot function is statistic which is a function which when applied to data returns a vector containing the statistic(s) of interest. The bootPSM function in the flexsurvPlus package can be used for this purpose.

The inputs for the bootPSM function are identical to the runPSM function, however there is one additional argument:

i - Index used to select a sample within boot

As the parameters are stored in the config object returned by runPSM it is possible to use do.call to simplify these calls assuming that models have already been fit using runPSM.

# illustrative example using original analysis models
# only create 2 replicates for illustration
set.seed(2358)
boot_psm_PFS_all <- do.call(boot, args = c(psm_PFS_all$config, R = 2, statistic = bootPSM))

# is the same as
set.seed(2358)
boot_psm_PFS_demo <- boot(
  R = 2, # number of bootstrap samples
  statistic = bootPSM, # bootstrap function
  data=PFS_data,
  time_var="PFS_days",
  event_var="PFS_event",
  model.type= c("Common shape",
                "Independent shape", 
                "Separate"),
  distr = c('exp',
            'weibull',
            'gompertz',
            'lnorm',
            'llogis',
            'gengamma',
            'gamma',
            'genf'),
  strata_var = "ARMCD",
  int_name="A",
  ref_name = "B"
)


all(boot_psm_PFS_all$t==boot_psm_PFS_demo$t, na.rm = TRUE)

For speed and to examine how this can be used we will repeat this selecting only 4 models.

set.seed(2358)
# To minimize vignette computation time only 100 bootstrap samples are taken. In general more samples should be used.
n.sim <- 100 

psm_PFS_selected <- runPSM(
  data=PFS_data,
  time_var="PFS_days",
  event_var="PFS_event",
  model.type = c("Common shape", "Separate"),
  distr = c('weibull', 'gamma'),
  strata_var = "ARMCD",
  int_name = "B",
  ref_name = "A"
)

PSM_bootstraps_PFS <- do.call(boot, args = c(psm_PFS_selected$config,
                                             statistic = bootPSM, # bootstrap function
                                             R=n.sim # number of bootstrap samples
                                             )
                              )

To use the result of these samples it is helpful to do some post processing to make the resulting samples easier to interpret.

# first extract the bootstrapped parameters into a tibble
PFS_bootsamples <- as_tibble(PSM_bootstraps_PFS$t)
# then add column names so can identify model and parameter more easily
colnames(PFS_bootsamples) <- names(PSM_bootstraps_PFS$t0)

# show the first 3 samples
PFS_bootsamples[1:3,]

Estimating quantities from the sample

As the flexsurv parameterisations are used any quantity of interest can be simply calculated for all models and samples through use of the flexsurv functions such as the extrapolated means via flexsurv::mean_weibull. For this example we assume we have decided that separate models for reference and intervention are most appropriate and that for the reference arm a gamma model is preferred while for the intervention arm a weibull model is best.

# we can now calculate the mean PFS for the selected models for each bootstrap sample
# using weibull for the reference
PFS_ref_mean <- with(PFS_bootsamples, flexsurv::mean_gamma(shape = sep.gamma.shape.ref, rate = sep.gamma.rate.ref))
# using gamma for the intervention
PFS_int_mean <- with(PFS_bootsamples, flexsurv::mean_weibull(shape = sep.weibull.shape.int, scale = sep.weibull.scale.int))

# we could also calculate these values also for the original data for a deterministic estimate 
PFS_means <- summaryPSM(psm_PFS_selected,
                        type = "mean") 

PFS_means %>%
  dplyr::transmute(Model, Dist, Strata,StrataName, value) %>%
  dplyr::filter((Dist == "Gamma" & Model == "Separate - Reference") |
                  (Dist == "Weibull" & Model == "Separate - Intervention") )


# we can then calculate the incremental mean PFS from these
PFS_delta = PFS_int_mean - PFS_ref_mean 


# if we are interested we can then estimate quantiles from this
quantile(PFS_delta, probs = c(0.025,0.975))

# or plot the density of this derived quantity

density <- density(PFS_delta)
plot(density, 
     lwd = 2, 
     main = "Density")

Outputing parameters to excel

The primary use of the bootstrap samples is to be used in probabilistic sensitivity analyses in economic models.

Once all the models have been fit and bootstrap samples estimated they can be output to excel. By selecting the "Main Estimates" the estimates for the original data are returned. To run PSA in excel only a random number between 1 and the number of samples needs to be generated and the associated Bootstrap sample selected.

# combine the estimates from the main analysis with the bootstrap samples
# and add meta data to include details of analysis

# first we can get combine the main estimates for the models with those that were bootstrapped
parameters_PFS <- rbind(PSM_bootstraps_PFS$t0, as_tibble(PSM_bootstraps_PFS$t))

# we can now add names
colnames(parameters_PFS) <- names(PSM_bootstraps_PFS$t0)

# we can then label the samples and add some metadata
metadata_PFS <- tibble(Estimate = c("Main Estimates", paste("Bootstrap Sample",1:n.sim)),
                       Study_name = "Study ABC",
                        Datacut = "Final",
                        Population = "ITT",
                        Endpoint = "PFS",
                        RefArmName = PSM_bootstraps_PFS$call$ref_name,
                        IntArmName = PSM_bootstraps_PFS$call$int_name)

pfs_for_export <- cbind(metadata_PFS, parameters_PFS)

# not run
#write.csv(parameters_PFS, "params_for_model.csv")

References

iain-t-bennett-roche/flexsurvPlus documentation built on Aug. 1, 2022, 10:10 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

iain-t-bennett-roche/flexsurvPlus
Provides Functions to Perform Survival Analysis for Economic models

In iain-t-bennett-roche/flexsurvPlus: Provides Functions to Perform Survival Analysis for Economic models

Introduction

Set up packages and data

Install packages

Read in the data

Exploratory analysis

Fitting the models

Estimate survival from the models and plot the curves

Adressing uncertainty - Bootstrapping

Estimating quantities from the sample

Outputing parameters to excel

References

R Package Documentation

Browse R Packages

We want your feedback!

iain-t-bennett-roche/flexsurvPlus Provides Functions to Perform Survival Analysis for Economic models

In iain-t-bennett-roche/flexsurvPlus: Provides Functions to Perform Survival Analysis for Economic models

Introduction

Set up packages and data

Install packages

Read in the data

Exploratory analysis

Fitting the models

Estimate survival from the models and plot the curves

Adressing uncertainty - Bootstrapping

Estimating quantities from the sample

Outputing parameters to excel

References

R Package Documentation

Browse R Packages

We want your feedback!

iain-t-bennett-roche/flexsurvPlus
Provides Functions to Perform Survival Analysis for Economic models