Analysing data with psborrow
In psborrow: Bayesian Dynamic Borrowing with Propensity Score

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette provides additional details of the analysis model that is fitted as well as a demonstration of how to use the package to fit said models to an existing dataset. For examples of how to fit the analysis model to the simulated datasets generated by the package itself, please see the user-guide vignette.

The "Analysis Model" section of this vignette is a modified excerpt from Lu, Y. et al. (2021).

Analysis Model

The package uses Markov chain Monte Carlo (MCMC) through JAGS to obtain the posterior distribution for the hazard ratio between treatment and control arms. The package relies on the Bayesian commensurate prior approach (Lewis, et al., 2019). The probability density function for the survival time of patient $i$ following a Weibull model is as below:

$$ t_i \sim Weibull( \nu, \lambda_i) $$

for $i = 1, 2, \dots,n$, where:

$\nu$ is the shape parameter
$\lambda$ is the rate,
$n$ is the total sample size of the analysis population.

The probability density function is:

$$ f(t_i, x_i) = \nu\lambda_i t^{\nu-1} e^{-\lambda_it^\nu} $$

with the rate taking the format:

$$ \lambda_i = exp(\alpha_i + \beta_{trt} \times I_{trt} + \beta^T \times X_{i}) $$

Where:

$\alpha_i = \alpha_{cc}$ if patient $i$ is in the concurrent trial, and $\alpha_i = \alpha_{ec}$ if patient $i$ is in the external trial. Because the impact of therapy is captured by $I_{trt}$ below, $\alpha_{cc}$ and $\alpha_{ec}$ effectively correspond to the log of baseline hazard rate for concurrent control arm and external control arm, respectively.
$\beta_n$ is an n-length vector of regression coefficients
$X_{ni}$ is an n-length covariate vector for patient $i$, including all additional parameters to adjust for (e.g. propensity score).
$I_{trt}$ is the treatment indicator (value equals to 0 or 1).

A hierarchical model is defined in which the log hazard rate for the concurrent controls is centered on the external controls, i.e. $\alpha_{cc} | \alpha_{ec} \sim N(\alpha_{ec}, \tau)$. The hyperprior $\tau$, also referred to as the commensurability parameter, determines the extent of borrowing (see Lewis, et al., 2019).

Four borrowing methods have been implemented:

Not borrowing any external control information
Fully incorporating external control arm patients to the concurrent control group
Dynamic borrowing with the commensurability hyperparameter $\tau$ following a Gamma distribution
Dynamic borrowing with the commensurability hyperparameter $\tau$ following a half-Cauchy distribution

For methods (1) and (2), a conventional Bayesian model without dynamic borrowing is deployed with all relevant parameters assigned a non-informative prior. For method (1) no external control arm information is used and as such $\alpha_{cc} = \alpha \sim N(0, 0.0001)$ For method (2) the external control arm is treated as part of the concurrent control and as such $\alpha_{cc} = \alpha_{ec} = \alpha \sim N(0, 0.0001)$.

In comparison, methods (3) and (4) have the commensurability hyper-parameter $\tau$, which acts as the precision variable assessing the similarity between the concurrent and external control groups. The prior of $tau$ follows a Gamma distribution (default: shape = 1, rate = 0.001) and a half-Cauchy distribution (default: location = 0, scale = 0.2), respectively, in methods 3 and 4. Users can update the default values for the parameter if they wish to customize the hyper-prior.

After users state the prior choices and the simulated data is ready for use, the Markov chain Monte Carlo (MCMC) algorithm is deployed to obtain samples from the posterior distribution.

Example of use

To demonstrate how to use the package to fit the analysis model we first generate a dataset where the time to event is drawn from the following distribution:

$$ t_i \sim Weibull(0.95, \lambda_i) $$ Where:

$$ \lambda_i = \frac{1}{150}I_{ec} \times \frac{1}{200}I_{cc} \times \frac{1}{400}I_{trt} \times exp(0.2\times x_{age} - 0.1\times I_{female}) $$

$I_{ec}$ is 1 if the patient is in the external trial else 0
$I_{cc}$ is 1 if the patient is in the concurrent trial else 0
$I_{trt}$ is 1 if the patient is in the treatment arm else 0
$I_{female}$ is 1 if the patient is female else 0
$x_{age}$ is the patient's age

Patients are then censored by drawing a censoring time from the exponential distribution with a rate parameter of $\lambda = 1/400$

suppressPackageStartupMessages({
    library(psborrow)
    library(dplyr)
    library(flexsurv)
})

set.seed(401)

n <- 1000

grp_lvl <- c("CC", "TRT", "EC")
sex_lvl <- c("Male", "Female")

log_hr_grp <- c(
    "TRT" = log(1 / 400),
    "CC" = log(1 / 200),
    "EC" = log(1 / 150)
)

log_hr_sex <- c(
    "Male" = 0,
    "Female" = -0.1
)

log_hr_age <- 0.2

Shape <- 0.95

dat <- tibble(
    pt = 1:n,
    grp = factor(
        sample(grp_lvl, size = n, replace = TRUE, prob = c(0.3, 0.3, 0.4)),
        levels = grp_lvl
    ),
    sex = factor(
        sample(sex_lvl, size = n, replace = TRUE, prob = c(0.4, 0.6)),
        levels = sex_lvl
    ),
    age = rnorm(n, mean = 0, sd = 1),
    lambda = exp(
        log_hr_age * age +
        log_hr_sex[as.character(sex)] +
        log_hr_grp[as.character(grp)]
    ),
    time = rweibullPH(n, scale = lambda, shape = Shape),
    centime = rexp(n, 1 / 400)
) %>%
    mutate(event = ifelse(time <= centime, 1, 0)) %>%
    mutate(time = ifelse(time <= centime, time, centime)) %>% 
    select(pt, grp, age, sex, time, event)

dat

To conduct an analysis we need to use the apply_mcmc() function. However to use this function we first need to convert our data into the required format. Full details of the required format can be found in help(apply_mcmc) however an example of this is shown below:

dat2 <- dat %>%
    mutate(trt = ifelse(grp == "TRT", 1, 0)) %>%
    mutate(ext = ifelse(grp == "EC", 1, 0)) %>%
    mutate(cnsr = 1 - event) %>%
    select(pt, trt, ext, time, cnsr, sex, age)

dat2

We can now pass our data to apply_mcmc() to produce our posterior samples. Prior distributions are specified using the set_prior() function, details for which can be found in help(set_prior) as well as the user-guide vignette. Covariates for the analysis model need to be specified by a 1-sided formula provided to the formula_cov argument. Note that the treatment indicator is automatically added to the model and should not be specified in the covariate formula; if you require treatment interactions these should be specified explicitly via trt:cov rather than by trt*cov. The intercept term must always be included in this formula. You are advised to leave the pred parameter as either "all" or "ps" depending on your requirements. If you do wish to restrict the covariates included in the model then this should be done by omitting them from the formula_cov rather than the pred argument in set_prior().

postsamp <- apply_mcmc(
    dt = dat2,
    formula_cov = ~ 1 + age + sex,
    priorObj = set_prior(pred = "all", prior = "gamma", r0 = 0.85, alpha = c(0, 0)),
    n.chains = 1,
    n.adapt = 100,
    n.burn = 100,
    n.iter = 200,
    seed = 47
)

From here we can extract our posterior samples as follows:

head(extract_samples(postsamp))

For reference the returned parameters are:

HR_cc_hc - The hazard ratio between the historical control arm and the concurrent control arm. This can be be thought of as the ratio of the scale parameter between the baseline external control distribution and the baseline trial distribution. This is equivalent to exp(alpha[2] - alpha[1])
HR_trt_cc - The hazard ratio between the treatment arm and the concurrent control arm. This is equivalent to exp(beta_trt)
alpha[1] - The shape parameter for the trial's baseline distribution
alpha[2] - The shape parameter for the historical control's baseline distribution
beta_trt - The log-hazard ratio for the treatment effect. This is equivalent to log(HR_trt_cc)
beta_age - The log-hazard ratio for the covariate age
beta_sexFemale - The log-hazard ratio for the covariate sexFemale
r0 - The scale parameter for the baseline distribution of both the trial and the historical control
tau/sigma - The precision/variance for alpha[1] i.e. controls how much information is borrowed from the historical control arm

If prior = "no_ext" or prior = "full_ext" then alpha[1] and alpha[2] will be replaced with a single parameter alpha representing the shape parameter for the single baseline distribution for the trail.

Additionally general summary statistics of our samples can be generated by the summary() function:

summary(postsamp)

References

Lu, Y., Lin, A., Pang, H., Zhu, J. (2021). PSBORROW: BAYESIAN DYNAMIC BORROWING R TOOL FOR COMPLEX INNOVATIVE TRIAL DESIGNS. ASA Biopharmaceutical Report, 28 (3), 11-19.
Lewis, C. J., Sarkar, S., Zhu, J., & Carlin, B. P. (2019). Borrowing from historical control data in cancer drug development: a cautionary tale and practical guidelines. Statistics in biopharmaceutical research, 11(1), 67-78.