In dcnorris/precautionary: Safety Diagnostics for Dose-Escalation Trial Designs

old <- options(rmarkdown.html_vignette.check_title = FALSE) # suppress un-needed warning
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.height = 4, fig.width = 6)
knitr::opts_knit$set(eval.after = "fig.cap")

library(precautionary)
library(knitr)
library(kableExtra)

options(digits = 4)

# Echo source focused on package functionality by redacting calls to kable()
# see https://bookdown.org/yihui/rmarkdown-cookbook/hook-hide.html:
local({
  hook_source <- knitr::knit_hooks$get('source')
  knitr::knit_hooks$set(source = function(x, options) {
    x <- gsub(" -> etc.", "", x)
    x <- x[!grepl("^etc. %>%", x)] # strip lines starting "etc. %>% ..."
    x <- x[!grepl("\\(etc.,", x)]  # strip lines with fun(etc., ...)
    hook_source(x, options)
  })
})

"In most trials, and in particular in cancer chemotherapy trials, it is rare that we have no idea at all about the dose-response relationship. Such information is implicitly used in the choice of dose levels available for use in the trial, and part of our approach here is to make an attempt to quantify such information."^[@oquigley_continual_1990]

"It is important to see to what extent the largely arbitrary specification of these features influences the operational nature of the method, as well as the conclusions reached. Experimenters prefer methods which are relatively insensitive to prior specification as well as the form of the working model."^[@chevret_continual_1993]

"Most clinicians do not feel sufficiently confident in their initial toxicity probability estimates to start above the lowest dose (often chosen to be 10% of rodents' LD$_{10}$)."^[@goodman_practical_1995]

"The main challenge when using the CRM is model calibration."^[@lee_calibration_2011]

Background

Since its inception 3 decades ago [@oquigley_continual_1990], the CRM has retreated from the sincere Bayesianism of its original aim to manifest clinically relevant prior information. The early retreat was quite swift, in fact. Section 2.4 of @oquigley_continual_1990, titled 'Establishing Dose Levels', suggested a substantial collaboration between modelers and clinical investigators. Yet within a few years, @chevret_continual_1993 already accepted "that $k$ distinct doses are chosen for experimentation, defined by the investigator through an implicit idea of the dose-toxicity relationship, however imprecise," and demoted the CRM's prior dose-toxicity curve to the status of a (multidimensional) tuning parameter. The retreat continued apace with @goodman_practical_1995 giving voice to the deep skepticism with which clinical investigators were already viewing the CRM. Indeed, a key "practical improvement" for which that paper is still cited, was its rejection of the prior curve's pretense to inform the trial starting dose.

By twenty years later, biostatisticians had completed their retreat, with the prior toxicity curve being variously rebranded to avoid using the word 'prior',^[For example, @lee_model_2009 exploit a technicality of the CRM to identify this prior curve with "the selection of 'dose levels'", while @wages_dose-finding_2011 apparently introduce the now-popular 'skeleton'.] and computer-based model tuning activities having become "the main challenge when using the CRM" [@lee_calibration_2011]. Biostatisticians remain to this day intent on problems arising out of CRM calibration practice, as demonstrated in the recent contribution by Braun [-@braun_simulationfree_2020], who develops a mean-field approximation that greatly accelerates CRM performance characterization and calibration. Applied to a previously-completed trial, this approximation technique reduces certain calibration steps from 6.3 hours to 18 seconds (1260$\times$) and from 2 days to 2 minutes ($\approx 1400 \times$).

Here I revisit Braun's application from the perspective of complete path enumeration (CPE), which package precautionary now implements for the CRM.

Performance characterization for a given CRM model

Prerequisite to any CRM prior calibration procedure aiming to optimize selected performance characteristics, is the ability to characterize performance for any given CRM model with fixed prior parameters.^[This is just the observation that we must be able to calculate an objective function in order to optimize it.] So we begin by repeating Braun's characterization of his calibrated model.^[This starts at the bottom of page 8 in @braun_simulationfree_2020.] The calibrated model is as follows:

skel_0 <- c(0.03, 0.11, 0.25, 0.42, 0.58, 0.71)
calmod <- Crm$new(skeleton = skel_0,
                  scale = 0.85, # aka 'sigma'
                  target = 0.25)

Braun's initial characterization is of trials of fixed enrollment $N=30$, with a cohort size of 2.^[The cohort size is not mentioned in the text, but see the example at the bottom of Appendix 1, where Braun sets cohort_size <- 2.] Trials of this size readily yield to CPE on desktop hardware:

d1_maxn <- 5
cum_maxn <- 10
system.time({
  calmod$
    no_skip_esc(TRUE)$    # compare Braun's 'restrict = T'
    no_skip_deesc(FALSE)$
      stop_func(function(x) {
        enrolled <- tabulate(x$level, nbins = length(x$prior))
        x$stop <- enrolled[1] >= d1_maxn || max(enrolled) >= cum_maxn
        x
      })$trace_paths(root_dose = 1,
                     cohort_sizes = rep(2, 15) # ==> N=30
                     )

  T <- calmod$path_array()
})

The 3-dimensional array T[j,c,d] has the same structure and function as in @norris_what_2020:

dim(T)

Along each of the r dim(T)[1] paths enumerated, toxicities have been tabulated separately for as many as r dim(T)[2] distinct cohorts enrolled at each of r dim(T)[3] doses.

Again as in @norris_what_2020, we may ...

Compute $\mathbf{b} = \sum_c {n \choose T_{c,d}}$

b <- apply(log(choose(2, T)), MARGIN = 1, FUN = sum, na.rm = TRUE)
length(b)

Compute $\mathrm{U}$ and $\mathrm{Y}$

Y <- apply(T, MARGIN = c(1,3), FUN = sum, na.rm = TRUE)
Z <- apply(2-T, MARGIN = c(1,3), FUN = sum, na.rm = TRUE)
U <- cbind(Y, Z)
dim(U)

Express $\mathbf{\pi}$ as a function of dose-wise DLT probabilities

log_pi <- function(prob.DLT) {
  log_p <- log(prob.DLT)
  log_q <- log(1 - prob.DLT)
  b + U %*% c(log_p, log_q)
}

Note in particular that $\mathbf{\pi}(\mathbf{p})$ is a function of the DLT probabilities at the design's prespecified doses, and so is defined on the simplex $0 < p_1 < p_2 < ... < p_D < 1$. To demonstrate that the constraint $\sum_j \pi^j \equiv 1$ holds over this domain, let us check the identity at a randomly chosen point:^[But note this constraint holds trivially, simply by the construction of $\mathrm{U}$ from complementary left and right halves, $\mathrm{Y}$ and $\mathrm{Z}$.]

p <- sort(runif(6)) # randomly select a DLT probability vector
sum(exp(log_pi(p))) # check path probabilities sum to 1

Together with this ability to compute the path probabilities $\mathbf{\pi}(\mathbf{p})$ under any scenario $\mathbf{p}$, the complete frequentist characterization by $\mathrm{T}$ of all path outcomes enables us immediately to obtain any of the common performance characteristics of interest:^[I suspend judgment as to whether we should do this, and indeed whether these performance characteristics make any sense at all.]

Probability of Correct Selection (PCS)

Whereas the array $T_{c,d}^j$ contains clinical events (enrollments, toxicities) along each path, the final dose recommendations are retained as the rightmost non-NA value in each row of calmod's path matrix.^[This matrix retains the columnar layout---although not the row degeneracy---of the dose transition pathways (DTP) tables of package dtpcrm] The path_rx() method of the Crm class returns this:

xtabs(~calmod$path_rx())

These dose recommendations of course must be weighted by the path probabilities of some chosen dose-toxicity scenario. For sake of an easy demonstration, let's take our scenario from the skeleton itself:

pi_skel <- calmod$path_probs(calmod$skeleton())
xtabs(pi_skel ~ calmod$path_rx())

According to our skeleton, dose 3 is 'the' MTD; so PCS is r round(xtabs(calmod$path_probs(calmod$skeleton()) ~ calmod$path_rx())[3],2) in this scenario.

Fraction Assigned to MTD

The array $T_{c,d}^j$ lets us count patients assigned to 'the' MTD:

path_cohorts <- apply(!is.na(T), MARGIN=1, FUN=sum, na.rm=TRUE)
path_MTDcohs <- apply(!is.na(T[,,3]), MARGIN=1, FUN=sum, na.rm=TRUE)
sum(pi_skel * path_MTDcohs / path_cohorts)

Fraction 'Overdosed'

A similar calculation enables us to calculate the fraction assigned to doses exceeding 'the' MTD:

path_ODcohs <- apply(!is.na(T[,,4:6]), MARGIN=1, FUN=sum, na.rm=TRUE)
sum(pi_skel * path_ODcohs / path_cohorts)

Furthermore, we can ask what fraction of thus-'overdosed' trial participants may be expected to exceed their own MTD$_i$'s:^[That is, to experience a DLT. Note that this latter fraction constitutes a patient-centered notion of 'overdose', in contrast to the traditionally dose-centered notion.]

path_DLTs4_6 <- apply(T[,,4:6], MARGIN=1, FUN=sum, na.rm=TRUE)
sum(pi_skel * path_DLTs4_6) / sum(pi_skel * 2*path_ODcohs)

Extensions to scenario ensembles

As the examples above illustrate, once a fixed trial design has been completely path-enumerated, its performance characteristics are available through fast array operations. The generalization of single-scenario performance evaluations to ensemble-average evaluations should prove straightforward.

Extending to larger trials

saved <- options(mc.cores = 6)
kraken <- data.frame(C = 11:25, J = NA_integer_, elapsed = NA_real_)
for (i in 1:nrow(kraken)) {
  C <- kraken$C[i]
  calmod$skeleton(calmod$skeleton()) # reset skeleton to clear cache for honest timing
  time <- system.time(
    calmod$trace_paths(1, rep(2,C), impl='rusti', unroll=8))
  kraken$elapsed[i] <- time['elapsed']
  kraken$J[i] <- dim(calmod$path_matrix())[1]
  print(kraken)
}
options(saved)

kraken <- eval(parse(text = "structure(list(C = c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), J = c(5001, 7839, 11571, 16125, 22041, 28485, 35589, 43371, 48255, 50379, 52011, 53085, 53205, 53205, 53205), elapsed = c(2.061, 2.938, 3.961, 5.05, 7.172, 8.937, 11.022, 13.407, 14.696, 14.912, 15.298, 15.421, 15.477, 15.515, 15.807)), row.names = c(NA, -15L), class = \"data.frame\")"))

kraken %>%
  kable(
    digits=2,
    caption="Size of complete path enumeration (CPE) for the model, enrolling from 11 to 25 cohorts of 2. Note that all paths of the trial terminate by the 23rd cohort. Timings (in seconds) obtained running CPE on 6 threads of a 2.6 GHz Core i7 processor."
  ) %>% kable_styling(full_width=FALSE, position="left")

kraken$N <- 2L * kraken$C
oldpar <- par(mar = c(5,6,1,1))
plot(J ~ N, log="y", data=subset(kraken, C<=23), las=1, xlab="", ylab="")
mtext("J", side=2, line=4.0, las=1)
mtext("Enrollment", side=1, line=2.5, las=1)
par(oldpar)

Reassuringly, the (perhaps typical) stopping criteria in this trial impose a finite upper bound on CPE size stringent enough to keep the space and time complexity of CPE feasible. Even without such planned stopping rules, as a practical matter, once a dose-finding trial has enrolled more than 30 participants, it ought to have yielded enough information to warrant revisiting the original design. This limits the reasonable 'time between overhauls'^[See https://en.wikipedia.org/wiki/Time_between_overhauls.] for a fixed CRM trial design. Thus, CPE-based performance characterization may prove entirely feasible for most practical applications of the standard CRM model.^[This perhaps does not apply to applications such as TITE CRM, where participants are modeled non-exchangeably. CPE for TITE CRM remains an open problem.]

Calibration of CRM priors

The foregoing analysis shows that CPE-based performance characterization for a given CRM model of typical size may require on the order of 15 seconds on quite modest hardware. To calibrate a CRM model via CPE, however, may incur this cost repeatedly---perhaps hundreds of times---during what will generally amount to the derivative-free optimization of a discontinuous objective function over the prior-parameter space. For the example at hand, the Nelder-Mead method finds a PCS-optimal skeleton in just 25 minutes:

scenario <- skel_0 # for a demo scenario, we use our original skeleton
pcs <- function(s, unroll=6) { # unroll=6 yields quickest PCS for C=15 case
  if (any(diff(s) <= 0) || s[1] <= 0 || s[length(s)] >= 1)
    return(NA) # out-of-bounds
  calmod$skeleton(s)
  calmod$trace_paths(1, rep(2,15), impl='rusti', unroll=unroll)
  pi_0 <- calmod$path_probs(scenario)
  correct <- calmod$path_rx() == 3
  sum(pi_0[correct])
}

optim(calmod$skeleton(), pcs, method="Nelder-Mead",
      control = list(fnscale = -1, # NEGative ==> MAXimize PCS
                     reltol = 0.001, trace = 2))

Exiting from Nelder Mead minimizer
    135 function evaluations used
$par
[1] 0.03040 0.08371 0.23867 0.43310 0.59292 0.72230

$value
[1] 0.5931

$counts
function gradient 
     135       NA 

$convergence
[1] 0

$message
NULL

Miscellaneous helps to optimization

Cheung and Chappell [-@cheung_simple_2002] develop an asymptotic assessment capable of rejecting certain skeletons as intrinsically unreasonable because they confer inadequate sensitivity on the CRM. This criterion applies irrespective of dose-toxicity scenario, and so might usefully restrict our skeleton search during calibration by Nelder-Mead or similar direct-search method. The dimensionality of this search might also be addressed directly, by parametrizing the skeleton in 2 or 3 suitable basis dimensions. But such exquisite efforts risk taking this type of calibration exercise too seriously.

On calibrating 'priors'

In everyday usage, calibration suggests an exacting procedure, in which typically a measuring instrument is adjusted to accord with some objective standard or criterion. In statistics, however, the term may not carry quite this connotation. Certainly, @grieve_idle_2016 discusses the calibration of Bayesian trial designs generally in much the same sense as our 'CRM calibration' here, with the criterion being provided by whatever frequentist properties are deemed desirable in each given application.

Whatever we call this chiropractic of the CRM skeleton, the very act of treating the CRM's prior dose-toxicity probabilities as free parameters undermines the presumption that model-basedness per se necessarily confers special qualities upon dose-finding methods. But now that even the CRM---the very type specimen of model-based dose finding---yields to complete path enumeration, 'model-based' has become a distinction without a difference.

options(old) # restore user's original options before finishing, per CRAN

# Create a bib file for packages cited in this paper
knitr::write_bib(c('dtpcrm'), file = 'packages.bib')

References

dcnorris/precautionary documentation built on Feb. 5, 2022, 12:13 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dcnorris/precautionary
Safety Diagnostics for Dose-Escalation Trial Designs

In dcnorris/precautionary: Safety Diagnostics for Dose-Escalation Trial Designs

Background

Performance characterization for a given CRM model

Compute $\mathbf{b} = \sum_c {n \choose T_{c,d}}$

Compute $\mathrm{U}$ and $\mathrm{Y}$

Express $\mathbf{\pi}$ as a function of dose-wise DLT probabilities

Probability of Correct Selection (PCS)

Fraction Assigned to MTD

Fraction 'Overdosed'

Extensions to scenario ensembles

Extending to larger trials

Calibration of CRM priors

Miscellaneous helps to optimization

On calibrating 'priors'

References

R Package Documentation

Browse R Packages

We want your feedback!

dcnorris/precautionary Safety Diagnostics for Dose-Escalation Trial Designs

In dcnorris/precautionary: Safety Diagnostics for Dose-Escalation Trial Designs

Background

Performance characterization for a given CRM model

Compute $\mathbf{b} = \sum_c {n \choose T_{c,d}}$

Compute $\mathrm{U}$ and $\mathrm{Y}$

Express $\mathbf{\pi}$ as a function of dose-wise DLT probabilities

Probability of Correct Selection (PCS)

Fraction Assigned to MTD

Fraction 'Overdosed'

Extensions to scenario ensembles

Extending to larger trials

Calibration of CRM priors

Miscellaneous helps to optimization

On calibrating 'priors'

References

R Package Documentation

Browse R Packages

We want your feedback!

dcnorris/precautionary
Safety Diagnostics for Dose-Escalation Trial Designs