In dominicmagirr/expectedevents: Calculate the expected number of events

What problem does this solve?

When designing a trial with a time-to-event endpoint, we often want to know how many events are expected at a given calendar time.

We will typically have some information on recruitment.

#### recruitment information
recruitment_1 = list(n = 100, 
                     r_period = 12, 
                     k = 1)

In this case, a single arm trial with 100 patients recruited over 12 months. The parameter $k$ dictates the shape of the recruitment curve:

$$pr(\text{recruited by calendar time t}) = (t / \text{r_period}) ^ k.$$

We also usually have a working assumption for the survival model:

#### model for survival distribution
model_1 = list(change_points = 4, 
               lambdas = c(log(2) / 15, log(2) / 15))

In this package, we only allow piece-wise exponential models with at least 2 pieces. This is not really a restriction because we can easily approximate any parametric survival distribution. In the above example, I'm assuming an exponential model with median 15 months. To do this, I specify an arbitrary change-point, but the event rate either side of the change-point remains the same.

How many events at a data cut-off (dco) at 30 months?

#library(expectedevents)
devtools::load_all("..")
expected_events_single_arm(dco = 30,
                           recruitment = recruitment_1,
                           model = model_1)

Two-arm trial

We can do the same thing for a two-arm trial

recruitment_2 = list(n_0 = 100, 
                     n_1 = 100, 
                     r_period = 12, 
                     k = 1)


model_2 = list(change_points = 4, 
               lambdas_0 = c(log(2) / 15, log(2) / 15), 
               lambdas_1 = c(log(2) / 15, 0.7 * log(2) / 15))

In this case the hazard ratio is 1 for the first 4 months. Thereafter, the hazard ratio is 0.7.

expected_events_two_arm(dco = 30,
                        recruitment = recruitment_2,
                        model = model_2)

Dropouts

We can add a parameter to deal with patients who dropout (withdraw from the study and are lost to follow-up). We assume that dropout occurs independently to the event of interest, and that time-to-dropout follows an exponential distribution with rate mu.

For example, we can modify the above example by adding dropout at rate mu = 0.01:

expected_events_two_arm(dco = 30,
                        recruitment = recruitment_2,
                        model = model_2,
                        mu = 0.01)

We observe fewer events due to the competing risk of dropout. If we acutally want to see the expected number of dropouts, we need to add the argument dropouts = TRUE:

expected_events_two_arm(dco = 30,
                        recruitment = recruitment_2,
                        model = model_2,
                        mu = 0.01,
                        dropouts = TRUE)

Proportion of patients who have withdrawn

The dropout parameter mu does not have a very intuitive interpretation. Often, we would prefer to assume something along the lines of: "we expect 10% of patients (out of the final total sample size) to have withdrawn at a data cut-off of 30 months".

prop_to_mu_single_arm and prop_to_mu_two_arm are functions that can convert this statement into a value for mu.

prop_to_mu_two_arm(prop_withdrawn = 0.1,
                   dco = 30,
                   recruitment = recruitment_2,
                   model = model_2)

Multiple data-cut offs

The expected_events_single_arm and expected_events_two_arm functions only allow a single dco as first argument. To assess several data cut-offs I suggest using, e.g., purrr::map:

dcos = seq(0.001, 30, length.out = 10)

events_dco = purrr::map_dbl(dcos,
                            expected_events_single_arm,
                            recruitment = recruitment_1,
                            model = model_1,
                            mu = 0.01)

dropouts_dco = purrr::map_dbl(dcos,
                              expected_events_single_arm,
                              recruitment = recruitment_1,
                              model = model_1,
                              mu = 0.01,
                              dropouts = TRUE)

round(events_dco, 1)
round(dropouts_dco, 1)

Average hazard ratio

For 2-arm trials, the expected_events_two_arm function also calculates the 'average hazard ratio', which is defined as

$$\exp\left\lbrace \sum_i p_i \log \left( \frac{\lambda_{1,i}}{\lambda_{0,i}} \right) \right\rbrace$$

where $\lambda_{0,i}$, $\lambda_{1,i}$ are the event rates on the two arms during patient-time interval [$t_i, t_{i-1}$), and $p_i$ is the proportion of total events that occur during patient-time interval [$t_i, t_{i-1}$). Here, $t_1,\ldots,t_{k-1}$ correspond to change_points. Also, $t_0 = 0$ and $t_{k} = \infty$.

To see this additional information we need to add the argument total_only = FALSE:

expected_events_two_arm(dco = 30,
                        recruitment = recruitment_2,
                        model = model_2,
                        mu = 0.01,
                        total_only = FALSE)

Approximating a survival curve from a different model

Let's take the example of a cure rate model. Consider an adjuvant study where 70% of patients are already "cured" coming in. Suppose for the "non-cured" patients, the hazard rate is $\log(2)/12$ on the control arm and $\log(2) / 6$ on the experimental arm. Overall, the hazard rates are non-proportional...

# survival
s_c = function(t) 0.7 + 0.3 * exp(-log(2) / 12 * t)
s_e = function(t) 0.7 + 0.3 * exp(-log(2) / 6 * t)
# density
f_c = function(t) log(2) / 12 * 0.3 * exp(-log(2) / 12 * t)
f_e = function(t) log(2) / 6 * 0.3 * exp(-log(2) / 6 * t)
# hazard
h_c = function(t) f_c(t) / s_c(t)
h_e = function(t) f_e(t) / s_e(t)
# hazard ratio
hr = function(t) h_e(t) / h_c(t)

ts = seq(0, 48, by = 1)

plot(ts, h_c(ts),
     type = 'l',
     ylim = c(0,0.05),
     ylab = "hazard",
     xlab = "time")

points(ts,
       h_e(ts),
       type = 'l',
       col = 2)

legend("topright",
       c("control", "experimental"),
       lty = c(1,1),
       col = 1:2)

### approximate model

model_approx = list(change_points = seq(2, 46, by = 2), 
                    lambdas_0 = h_c(seq(1, 47, by = 2)), 
                    lambdas_1 = h_e(seq(1, 47, by = 2)))

points(c(0, model_approx$change_points), model_approx$lambdas_0, type = 's')
points(c(0, model_approx$change_points), model_approx$lambdas_1, type = 's', col = 2)

We can use this approximate model, as before.

expected_events_two_arm(dco = 30,
                        recruitment = recruitment_2,
                        model = model_approx,
                        mu = 0.01)

Example how to plot survival curves from approximation

plot(seq(0, 48, length.out = 100),
     purrr::map_dbl(seq(0, 48, length.out = 100), 
                    surv_pieces,
                    change_points = model_approx$change_points,
                    lambdas = model_approx$lambdas_0),
     type = 'l',
     ylim = c(0,1),
     ylab = "survival",
     xlab = "time")

Calculating power for a GSD under NPH

Suppose you plan to trigger an interim analysis after 60 events, and the final analysis after 120 events.

library(gsDesign)
library(mvtnorm)

event_targets = c(60, 120)

The first step is to predict at what calendar times we expect to reach these milestones.

## get timing
dcos = purrr::map_dbl(event_targets, expected_dco_two_arm, 
                      recruitment = recruitment_2,
                      model = model_2,
                      mu = 0,
                      dropouts = FALSE)
dcos

Next we calculate the average hazard ratio at these milestones.

## get average hazard ratios
full_info = purrr::map(dcos, expected_events_two_arm,
                       recruitment = recruitment_2,
                       model = model_2,
                       mu = 0,
                       dropouts = FALSE,
                       total_only = FALSE)

av_hrs = purrr::map_dbl(full_info, function(x) x$av_hr)

av_hrs

In this example suppose we are using an O'Brien and Fleming upper boundary, with a non-binding futility boundary at Z = 0.

We can use the function 'get_power' to calculate the probability of crossing each boundary point, and therefore the cumulative power.

## get bound
d = gsDesign(k = 2, test.type = 1, sfu = sfLDOF)


p = get_power(events = event_targets,
              av_hrs = av_hrs,
              upper_bound = c(d$upper$bound),
              lower_bound = c(0, c(d$upper$bound)[2]),
              recruitment = recruitment_2)

p