knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) run <- requireNamespace("dplyr", quietly = TRUE) && requireNamespace("gt", quietly = TRUE) knitr::opts_chunk$set(eval = run)
This vignette demonstrates the lower-level routines in the simtrial package specifically related to trial generation and statistical testing.
The routines are as follows:
randomize_by_fixed_block()
- fixed block randomizationrpwexp_enroll()
- random inter-arrival times with piecewise constant enrollment ratesrpwexp()
- piecewise exponential failure rate generationcut_data_by_date()
- cut data for analysis at a specified calendar timecut_data_by_event()
- cut data for analysis at a specified event count, including ties on the cutoff dateget_cut_date_by_event()
- find date at which an event count is reachedcounting_process()
- pre-process survival data into a counting process formatApplication of the above is demonstrated using higher-level routines sim_pw_surv()
and sim_fixed_n()
to generate simulations and weighted logrank analysis for a stratified design.
The intent has been to write these routines in the spirit of the tidyverse approach (alternately referred to as data wrangling, tidy data, R for Data Science, or split-apply-combine). The other objectives are to have an easily documentable and validated package that is easy to use and efficient as a broadly-useful tool for simulation of time-to-event clinical trials.
The package could be extended in many ways in the future, including:
library(simtrial) library(gt) library(dplyr)
Fixed block randomization with an arbitrary block contents is performed as demonstrated below. In this case we have a block size of 5 with one string repeated twice in each block and three other strings appearing once each.
randomize_by_fixed_block(n = 10, block = c("A", "Dog", "Cat", "Cat"))
More normally, with a default of blocks of size four:
randomize_by_fixed_block(n = 20)
Piecewise constant enrollment can be randomly generated as follows. Note that duration is specifies interval durations for constant rates; the final rate is extended as long as needed to generate the specified number of observations.
rpwexp_enroll( n = 20, enroll_rate = data.frame( duration = c(1, 2), rate = c(2, 5) ) )
Time-to-event and time-to-dropout random number generation for observations is generated with piecewise exponential failure times. For a large number of observations, a log-plot of the time-to-failure
x <- rpwexp( 10000, fail_rate = data.frame( rate = c(1, 3, 10), duration = c(.5, .5, 1) ) ) plot( sort(x), (10000:1) / 10001, log = "y", main = "PW Exponential simulated survival curve", xlab = "Time", ylab = "P{Survival}" )
Ideally, this might be done with a routine where generation of randomization, and time-to-event data could be done in a modular fashion plugged into a general trial generation routine.
For now, stratified randomization, piecewise constant enrollment, fixed block randomization and piecewise exponential failure rates support a flexible set of trial generation options for time-to-event endpoint trials.
At present, follow this format very carefully as little checking of input has been developed to-date.
The methods used here have all be demonstrated above, but here they are combined in a single routine to generate a trial. Note that in the generated output dataset, cte
is the calendar time of an event or dropout, whichever comes first, and fail
is an indicator that cte
represents an event time.
First we set up input variables to make the later call to sim_pw_surv()
more straightforward to read.
stratum <- data.frame(stratum = c("Negative", "Positive"), p = c(.5, .5)) block <- c(rep("control", 2), rep("experimental", 2)) enroll_rate <- data.frame(rate = c(3, 6, 9), duration = c(3, 2, 1)) fail_rate <- data.frame( stratum = c(rep("Negative", 4), rep("Positive", 4)), period = rep(1:2, 4), treatment = rep(c(rep("control", 2), rep("experimental", 2)), 2), duration = rep(c(3, 1), 4), rate = log(2) / c(4, 9, 4.5, 10, 4, 9, 8, 18) ) dropout_rate <- data.frame( stratum = c(rep("Negative", 4), rep("Positive", 4)), period = rep(1:2, 4), treatment = rep(c(rep("control", 2), rep("experimental", 2)), 2), duration = rep(c(3, 1), 4), rate = rep(c(.001, .001), 4) )
x <- sim_pw_surv( n = 400, stratum = stratum, block = block, enroll_rate = enroll_rate, fail_rate = fail_rate, dropout_rate = dropout_rate ) head(x) |> gt() |> fmt_number(columns = c("enroll_time", "fail_time", "dropout_time", "cte"), decimals = 2)
There two ways to cut off data in the generated dataset x
from above.
The first uses a calendar cutoff date. The output only includes the time from randomization to event or dropout (tte
), and indicator that this represents and event (event
), the stratum in which the observation was generated (stratum
) and the treatment group assigned (treatment
).
Observations enrolled after the input cut_date
are deleted and events and censoring from x
that are after the cut_date
are censored at the specified cut_date
.
y <- cut_data_by_date(x, cut_date = 5) head(y) |> gt() |> fmt_number(columns = "tte", decimals = 2)
For instance, if we wish to cut the entire dataset when 50 events are observed in the Positive stratum we can use the get_cut_date_by_event
function as follows:
cut50Positive <- get_cut_date_by_event(filter(x, stratum == "Positive"), 50) y50Positive <- cut_data_by_date(x, cut50Positive) with(y50Positive, table(stratum, event))
Perhaps the most common way to cut data is with an event count for the overall population, which is done using the cut_data_by_event
function. Note that if there are tied events at the date the cte
the count is reached, all are included. Also, if the count is never reached, all event times are included in the cut - with no indication of an error.
y150 <- cut_data_by_event(x, 150) table(y150$event, y150$treatment)
Once we have cut data for analysis, we can create a dataset that is very simple to use for weighted logrank tests. A slightly more complex version could be developed in the future to enable Kaplan-Meier-based tests. We take the dataset y150
from above and process it into this format.
The counting process format is further discussed in the next section where we compute a weighted logrank test.
ten150 <- counting_process(y150, arm = "experimental") head(ten150) |> gt() |> fmt_number(columns = c("tte", "o_minus_e", "var_o_minus_e"), decimals = 2)
Now stratified logrank and stratified weighted logrank tests are easily generated based on the counting process format. Each record in the counting process dataset represents a tte
at which one or more events occurs; the results are stratum-specific. Included in the observation is the number of such events overall (events
) and in the experimental treatment group (txevents
), the number at risk overall (atrisk
) and in the experimental treatment group (txatrisk
) just before tte
, the combined treatment group Kaplan-Meier survival estimate (left-continuous) at tte
, the observed events in experimental group minus the expected at tte
based on an assumption that all at risk observations are equally likely to have an event at any time, and the variance for this quantity (Var
).
To generate a stratified logrank test and a corresponding one-sided p-value, we simply do the following:
z <- with(ten150, sum(o_minus_e) / sqrt(sum(var_o_minus_e))) c(z, pnorm(z))
A Fleming-Harrington $\rho=1$, $\gamma=2$ is nearly as simple. We again compute a z-statistic and its corresponding one-sided p-value.
xx <- mutate(ten150, w = s * (1 - s)^2) z <- with(xx, sum(o_minus_e * w) / sum(sqrt(var_o_minus_e * w^2))) c(z, pnorm(z))
For Fleming-Harrington tests, a routine has been built to do these tests for you:
fh00 <- y150 |> wlr(weight = fh(rho = 0, gamma = 0)) fh01 <- y150 |> wlr(weight = fh(rho = 0, gamma = 1)) fh10 <- y150 |> wlr(weight = fh(rho = 1, gamma = 0)) fh11 <- y150 |> wlr(weight = fh(rho = 1, gamma = 1)) temp_tbl <- fh00 |> unlist() |> as.data.frame() |> cbind(fh01 |> unlist() |> as.data.frame()) |> cbind(fh10 |> unlist() |> as.data.frame()) |> cbind(fh11 |> unlist() |> as.data.frame()) colnames(temp_tbl) <- c("Test 1", "Test 2", "Test 3", "Test 4") temp_tbl
If we wanted to take the minimum of these for a MaxCombo test, we would first use fh_weight()
to compute a correlation matrix for the above z-statistics as follows.
Note that the ordering of rho_gamma
and g
in the argument list is opposite of the above.
The correlation matrix for the z
-values is now in V1
-V4
.
We can compute a p-value for the MaxCombo as follows using mvtnorm::pmvnorm()
. Note the arguments for GenzBretz()
which are more stringent than the defaults; we have also used these more stringent parameters in the example in the help file.
y150 |> maxcombo(rho = c(0, 0, 1, 1), gamma = c(0, 1, 0, 1))
The sim_fixed_n()
routine combines much of the above to go straight to generating tests for individual trials so that cutting data and analyzing do not need to be done separately.
Here the argument structure is meant to be simpler than for sim_pw_surv()
.
stratum <- data.frame(stratum = "All", p = 1) enroll_rate <- data.frame( duration = c(2, 2, 10), rate = c(3, 6, 9) ) fail_rate <- data.frame( stratum = "All", duration = c(3, 100), fail_rate = log(2) / c(9, 18), hr = c(0.9, 0.6), dropout_rate = rep(0.001, 2) ) block <- rep(c("experimental", "control"), 2) rho_gamma <- data.frame(rho = 0, gamma = 0)
Now we simulate a trial 2 times and cut data for analysis based on timing_type = 1:5
which translates to:
sim_fixed_n( n_sim = 2, # Number of simulations sample_size = 500, # Trial sample size target_event = 350, # Targeted events at analysis stratum = stratum, # Study stratum enroll_rate = enroll_rate, # Enrollment rates fail_rate = fail_rate, # Failure rates total_duration = 30, # Planned trial duration block = block, # Block for treatment timing_type = 1:5, # Use all possible data cutoff methods rho_gamma = rho_gamma # FH test(s) to use; in this case, logrank ) |> gt() |> fmt_number(columns = c("ln_hr", "z", "duration"))
If you look carefully, you should be asking why the cutoff with the planned number of events is so different than the other data cutoff methods.
To explain, we note that generally you will want sample_size
above to match the enrollment specified in enroll_rate
:
enroll_rate |> summarize( "Targeted enrollment based on input enrollment rates" = sum(duration * rate) )
The targeted enrollment takes, on average, 30 months longer than the sum of the enrollment durations in enroll_rate
(14 months) at the input enrollment rates. To achieve the input sample_size
of 500, the final enrollment rate is assumed to be steady state and extends in each simulation until the targeted enrollment is achieved. The planned duration of the trial is taken as 30 months as specified in total_duration
. The targeted minimum follow-up is
total_duration <- 30 # From above total_duration - sum(enroll_rate$duration)
It is thus, implicit that the last subject was enrolled 16 months prior to the duration given for the cutoff with "Minimum follow-up" cutoff in the simulations above.
The planned duration cutoff is given in the total_duration
argument which results in a much earlier cutoff.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.