old <- options(rmarkdown.html_vignette.check_title = FALSE) # suppress un-needed warning knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(fig.height = 4, fig.width = 6) knitr::opts_knit$set(eval.after = "fig.cap") library(precautionary) library(knitr) library(kableExtra) options(digits = 4) # Echo source focused on package functionality by redacting calls to kable() # see https://bookdown.org/yihui/rmarkdown-cookbook/hook-hide.html: local({ hook_source <- knitr::knit_hooks$get('source') knitr::knit_hooks$set(source = function(x, options) { x <- gsub(" -> etc.", "", x) x <- x[!grepl("^etc. %>%", x)] # strip lines starting "etc. %>% ..." x <- x[!grepl("\\(etc.,", x)] # strip lines with fun(etc., ...) hook_source(x, options) }) })
"In most trials, and in particular in cancer chemotherapy trials, it is rare that we have no idea at all about the dose-response relationship. Such information is implicitly used in the choice of dose levels available for use in the trial, and part of our approach here is to make an attempt to quantify such information."^[@oquigley_continual_1990]
"It is important to see to what extent the largely arbitrary specification of these features influences the operational nature of the method, as well as the conclusions reached. Experimenters prefer methods which are relatively insensitive to prior specification as well as the form of the working model."^[@chevret_continual_1993]
"Most clinicians do not feel sufficiently confident in their initial toxicity probability estimates to start above the lowest dose (often chosen to be 10% of rodents' LD$_{10}$)."^[@goodman_practical_1995]
"The main challenge when using the CRM is model calibration."^[@lee_calibration_2011]
Since its inception 3 decades ago [@oquigley_continual_1990], the CRM has retreated from the sincere Bayesianism of its original aim to manifest clinically relevant prior information. The early retreat was quite swift, in fact. Section 2.4 of @oquigley_continual_1990, titled 'Establishing Dose Levels', suggested a substantial collaboration between modelers and clinical investigators. Yet within a few years, @chevret_continual_1993 already accepted "that $k$ distinct doses are chosen for experimentation, defined by the investigator through an implicit idea of the dose-toxicity relationship, however imprecise," and demoted the CRM's prior dose-toxicity curve to the status of a (multidimensional) tuning parameter. The retreat continued apace with @goodman_practical_1995 giving voice to the deep skepticism with which clinical investigators were already viewing the CRM. Indeed, a key "practical improvement" for which that paper is still cited, was its rejection of the prior curve's pretense to inform the trial starting dose.
By twenty years later, biostatisticians had completed their retreat, with the prior toxicity curve being variously rebranded to avoid using the word 'prior',^[For example, @lee_model_2009 exploit a technicality of the CRM to identify this prior curve with "the selection of 'dose levels'", while @wages_dose-finding_2011 apparently introduce the now-popular 'skeleton'.] and computer-based model tuning activities having become "the main challenge when using the CRM" [@lee_calibration_2011]. Biostatisticians remain to this day intent on problems arising out of CRM calibration practice, as demonstrated in the recent contribution by Braun [-@braun_simulationfree_2020], who develops a mean-field approximation that greatly accelerates CRM performance characterization and calibration. Applied to a previously-completed trial, this approximation technique reduces certain calibration steps from 6.3 hours to 18 seconds (1260$\times$) and from 2 days to 2 minutes ($\approx 1400 \times$).
Here I revisit Braun's application from the perspective of complete path enumeration (CPE), which package precautionary
now implements for the CRM.
Prerequisite to any CRM prior calibration procedure aiming to optimize selected performance characteristics, is the ability to characterize performance for any given CRM model with fixed prior parameters.^[This is just the observation that we must be able to calculate an objective function in order to optimize it.] So we begin by repeating Braun's characterization of his calibrated model.^[This starts at the bottom of page 8 in @braun_simulationfree_2020.] The calibrated model is as follows:
skel_0 <- c(0.03, 0.11, 0.25, 0.42, 0.58, 0.71) calmod <- Crm$new(skeleton = skel_0, scale = 0.85, # aka 'sigma' target = 0.25)
Braun's initial characterization is of trials of fixed enrollment $N=30$, with a cohort size of 2.^[The cohort size is not mentioned in the text, but see the example at the bottom of Appendix 1, where Braun sets cohort_size <- 2
.] Trials of this size readily yield to CPE on desktop hardware:
d1_maxn <- 5 cum_maxn <- 10 system.time({ calmod$ no_skip_esc(TRUE)$ # compare Braun's 'restrict = T' no_skip_deesc(FALSE)$ stop_func(function(x) { enrolled <- tabulate(x$level, nbins = length(x$prior)) x$stop <- enrolled[1] >= d1_maxn || max(enrolled) >= cum_maxn x })$trace_paths(root_dose = 1, cohort_sizes = rep(2, 15) # ==> N=30 ) T <- calmod$path_array() })
The 3-dimensional array T[j,c,d]
has the same structure and function as in @norris_what_2020:
dim(T)
Along each of the r dim(T)[1]
paths enumerated, toxicities have been tabulated separately for as many as r dim(T)[2]
distinct cohorts enrolled at each of r dim(T)[3]
doses.
Again as in @norris_what_2020, we may ...
b <- apply(log(choose(2, T)), MARGIN = 1, FUN = sum, na.rm = TRUE) length(b)
Y <- apply(T, MARGIN = c(1,3), FUN = sum, na.rm = TRUE) Z <- apply(2-T, MARGIN = c(1,3), FUN = sum, na.rm = TRUE) U <- cbind(Y, Z) dim(U)
log_pi <- function(prob.DLT) { log_p <- log(prob.DLT) log_q <- log(1 - prob.DLT) b + U %*% c(log_p, log_q) }
Note in particular that $\mathbf{\pi}(\mathbf{p})$ is a function of the DLT probabilities at the design's prespecified doses, and so is defined on the simplex $0 < p_1 < p_2 < ... < p_D < 1$. To demonstrate that the constraint $\sum_j \pi^j \equiv 1$ holds over this domain, let us check the identity at a randomly chosen point:^[But note this constraint holds trivially, simply by the construction of $\mathrm{U}$ from complementary left and right halves, $\mathrm{Y}$ and $\mathrm{Z}$.]
p <- sort(runif(6)) # randomly select a DLT probability vector sum(exp(log_pi(p))) # check path probabilities sum to 1
Together with this ability to compute the path probabilities $\mathbf{\pi}(\mathbf{p})$ under any scenario $\mathbf{p}$, the complete frequentist characterization by $\mathrm{T}$ of all path outcomes enables us immediately to obtain any of the common performance characteristics of interest:^[I suspend judgment as to whether we should do this, and indeed whether these performance characteristics make any sense at all.]
Whereas the array $T_{c,d}^j$ contains clinical events (enrollments, toxicities) along each path, the final dose recommendations are retained as the rightmost non-NA
value in each row of calmod
's path matrix.^[This matrix retains the columnar layout---although not the row degeneracy---of the dose transition pathways (DTP) tables of package dtpcrm
] The path_rx()
method of the Crm
class returns this:
xtabs(~calmod$path_rx())
These dose recommendations of course must be weighted by the path probabilities of some chosen dose-toxicity scenario. For sake of an easy demonstration, let's take our scenario from the skeleton itself:
pi_skel <- calmod$path_probs(calmod$skeleton()) xtabs(pi_skel ~ calmod$path_rx())
According to our skeleton, dose 3 is 'the' MTD; so PCS is r round(xtabs(calmod$path_probs(calmod$skeleton()) ~ calmod$path_rx())[3],2)
in this scenario.
The array $T_{c,d}^j$ lets us count patients assigned to 'the' MTD:
path_cohorts <- apply(!is.na(T), MARGIN=1, FUN=sum, na.rm=TRUE) path_MTDcohs <- apply(!is.na(T[,,3]), MARGIN=1, FUN=sum, na.rm=TRUE) sum(pi_skel * path_MTDcohs / path_cohorts)
A similar calculation enables us to calculate the fraction assigned to doses exceeding 'the' MTD:
path_ODcohs <- apply(!is.na(T[,,4:6]), MARGIN=1, FUN=sum, na.rm=TRUE) sum(pi_skel * path_ODcohs / path_cohorts)
Furthermore, we can ask what fraction of thus-'overdosed' trial participants may be expected to exceed their own MTD$_i$'s:^[That is, to experience a DLT. Note that this latter fraction constitutes a patient-centered notion of 'overdose', in contrast to the traditionally dose-centered notion.]
path_DLTs4_6 <- apply(T[,,4:6], MARGIN=1, FUN=sum, na.rm=TRUE) sum(pi_skel * path_DLTs4_6) / sum(pi_skel * 2*path_ODcohs)
As the examples above illustrate, once a fixed trial design has been completely path-enumerated, its performance characteristics are available through fast array operations. The generalization of single-scenario performance evaluations to ensemble-average evaluations should prove straightforward.
saved <- options(mc.cores = 6) kraken <- data.frame(C = 11:25, J = NA_integer_, elapsed = NA_real_) for (i in 1:nrow(kraken)) { C <- kraken$C[i] calmod$skeleton(calmod$skeleton()) # reset skeleton to clear cache for honest timing time <- system.time( calmod$trace_paths(1, rep(2,C), impl='rusti', unroll=8)) kraken$elapsed[i] <- time['elapsed'] kraken$J[i] <- dim(calmod$path_matrix())[1] print(kraken) } options(saved)
kraken <- eval(parse(text = "structure(list(C = c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), J = c(5001, 7839, 11571, 16125, 22041, 28485, 35589, 43371, 48255, 50379, 52011, 53085, 53205, 53205, 53205), elapsed = c(2.061, 2.938, 3.961, 5.05, 7.172, 8.937, 11.022, 13.407, 14.696, 14.912, 15.298, 15.421, 15.477, 15.515, 15.807)), row.names = c(NA, -15L), class = \"data.frame\")")) kraken %>% kable( digits=2, caption="Size of complete path enumeration (CPE) for the model, enrolling from 11 to 25 cohorts of 2. Note that all paths of the trial terminate by the 23rd cohort. Timings (in seconds) obtained running CPE on 6 threads of a 2.6 GHz Core i7 processor." ) %>% kable_styling(full_width=FALSE, position="left")
kraken$N <- 2L * kraken$C oldpar <- par(mar = c(5,6,1,1)) plot(J ~ N, log="y", data=subset(kraken, C<=23), las=1, xlab="", ylab="") mtext("J", side=2, line=4.0, las=1) mtext("Enrollment", side=1, line=2.5, las=1) par(oldpar)
Reassuringly, the (perhaps typical) stopping criteria in this trial impose a finite upper bound on CPE size stringent enough to keep the space and time complexity of CPE feasible. Even without such planned stopping rules, as a practical matter, once a dose-finding trial has enrolled more than 30 participants, it ought to have yielded enough information to warrant revisiting the original design. This limits the reasonable 'time between overhauls'^[See https://en.wikipedia.org/wiki/Time_between_overhauls.] for a fixed CRM trial design. Thus, CPE-based performance characterization may prove entirely feasible for most practical applications of the standard CRM model.^[This perhaps does not apply to applications such as TITE CRM, where participants are modeled non-exchangeably. CPE for TITE CRM remains an open problem.]
The foregoing analysis shows that CPE-based performance characterization for a given CRM model of typical size may require on the order of 15 seconds on quite modest hardware. To calibrate a CRM model via CPE, however, may incur this cost repeatedly---perhaps hundreds of times---during what will generally amount to the derivative-free optimization of a discontinuous objective function over the prior-parameter space. For the example at hand, the Nelder-Mead method finds a PCS-optimal skeleton in just 25 minutes:
scenario <- skel_0 # for a demo scenario, we use our original skeleton pcs <- function(s, unroll=6) { # unroll=6 yields quickest PCS for C=15 case if (any(diff(s) <= 0) || s[1] <= 0 || s[length(s)] >= 1) return(NA) # out-of-bounds calmod$skeleton(s) calmod$trace_paths(1, rep(2,15), impl='rusti', unroll=unroll) pi_0 <- calmod$path_probs(scenario) correct <- calmod$path_rx() == 3 sum(pi_0[correct]) }
optim(calmod$skeleton(), pcs, method="Nelder-Mead", control = list(fnscale = -1, # NEGative ==> MAXimize PCS reltol = 0.001, trace = 2))
Exiting from Nelder Mead minimizer 135 function evaluations used $par [1] 0.03040 0.08371 0.23867 0.43310 0.59292 0.72230 $value [1] 0.5931 $counts function gradient 135 NA $convergence [1] 0 $message NULL
Cheung and Chappell [-@cheung_simple_2002] develop an asymptotic assessment capable of rejecting certain skeletons as intrinsically unreasonable because they confer inadequate sensitivity on the CRM. This criterion applies irrespective of dose-toxicity scenario, and so might usefully restrict our skeleton search during calibration by Nelder-Mead or similar direct-search method. The dimensionality of this search might also be addressed directly, by parametrizing the skeleton in 2 or 3 suitable basis dimensions. But such exquisite efforts risk taking this type of calibration exercise too seriously.
In everyday usage, calibration suggests an exacting procedure, in which typically a measuring instrument is adjusted to accord with some objective standard or criterion. In statistics, however, the term may not carry quite this connotation. Certainly, @grieve_idle_2016 discusses the calibration of Bayesian trial designs generally in much the same sense as our 'CRM calibration' here, with the criterion being provided by whatever frequentist properties are deemed desirable in each given application.
Whatever we call this chiropractic of the CRM skeleton, the very act of treating the CRM's prior dose-toxicity probabilities as free parameters undermines the presumption that model-basedness per se necessarily confers special qualities upon dose-finding methods. But now that even the CRM---the very type specimen of model-based dose finding---yields to complete path enumeration, 'model-based' has become a distinction without a difference.
options(old) # restore user's original options before finishing, per CRAN
# Create a bib file for packages cited in this paper knitr::write_bib(c('dtpcrm'), file = 'packages.bib')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.