knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette, I will show how this package can be used to study the design aspects or estimate 'optimal' intervention packages through simulations related to the LAGO design. I will briefly introduce the LAGO study, its goals and a brief section on the LAGO methodology, before moving on to the section on how to use this package.
The U.S. Food and Drug Administration defines an adaptive design as "$\dots$ a clinical study design that allows for prospectively planned modifications based on accumulating study data without undermining the study’s integrity and validity" (FDA, 2016). Adaptive designs have been developed, studied and used in clinical trials for over a decade now. The "Learn-As-You-Go" (LAGO) (Nevo et. al. Analysis of "Learn-As-You-Go" (LAGO) Studies", The Annals of Statistics, 2021, 49(2):793-819.) design is a novel design, motivated by large scale public health intervention trials, where participants receive a complex multi-component intervention package which may be for e.g. a treatment, a device, a new way to organize care or a combination thereof. The study is conducted in multiple stages. One of the goals of the study is to prospectively modify the intervention package at each stage to develop the optimal intervention package to be administered at the next stage. The data collected at each step is analyzed to reassess and thereby modify the intervention package to be rolled out at the next stage and thus, the LAGO design is an adaptive design. The first goal of a LAGO study is to identify the optimal intervention package while minimizing the cost of the intervention package subject to the probability of a desired binary outcome being above a given threshold. The second goal of the study is to assess the effectiveness of the intervention package on the desired response.
In this section, I present briefly the theoretical backdrop of the LAGO design that is used to develop the proposed R-package. The reader is referred to the original paper for a more comprehensive guide to the methodology of the LAGO design.
The multivariate intervention package consists of $p$ components. Let $X$ be the support of the intervention, that is, all possible intervention values. Let there be $\mathbf{K}$ stages in the study and at each stage $k$, a version of the intervention package is implemented in each of $J_k$ centers. Let $n_{jk}$ denote the sample size in the $j$-th center at stage $k$. For stage 1, an initial $\mathbf{x}^{(1)}$ is chosen by the investigator based on their best judgement. The actual intervention, denoted by $\mathbf{A}$ received by any participant at any stage differs from the recommended intervention due to local constraints or other preferences. It is assumed that the probability of success for a single unit i in a center j under intervention $\mathbf{A}=\mathbf{a}j$, $$p{a_j}(\beta) = Pr(Y_{ij} = 1| \mathbf{A} = \mathbf{a}j, \mathbf{X} = \mathbf{x}_j; \beta)$$ does not depend on the recommended intervention $\mathbf{x}_j$ , except through the actual intervention $\mathbf{a}_j$, and follows a logistic regression model $$logit(p{a_j}(\beta)) = \beta_0 + \beta_1 ^T \mathbf{a}_j$$ where $\beta^T = (\beta_0, \beta_1 ^T)$ is a vector of unknown parameters such that $\beta_1$ describes the effects of the p intervention package components. It is further assumed that in each stage, conditionally on all $\mathbf{a}_j$, outcomes are independent within and between centers.
A main goal of the LAGO design is to identify the optimal intervention package. Let $\tilde{p}$ be a pre-specified outcome probability goal and $C(\mathbf{x})$ be a known cost function. If $\beta$ were known, an optimal intervention for a center could be the solution to the center-specific optimization problem $$min_{\mathbf{x}j} C(\mathbf{x}_j) \ \ \ \text{subject to}\ \ p{x_j}(\beta) \geq \tilde{p} \ \ \& \ \ \mathbf{x}_j \in \mathbf{X}$$ However, as $\beta$ is almost always unknown, they are estimated from the data accumulated till any given stage and are used to solve the above stated optimization problem with the estimated $\hat{\beta}$ to obtain the estimated optimal intervention package. At any stage, since $\hat{\beta}$ depends on the data from the previous stage, and hence the estimated optimal intervention package depends on the data from previous stages and is hence random. The paper further develops the theoretical development of the optimization problem under the stated issue which I will not discuss here. For the problem of testing the effectiveness of the intervention package it has been shown that under some assumptions and regularity conditions, the estimated $\hat{\beta}$ is consistent and is asymptotically normally distributed. Thus the standard use of asymptotic Wald test for "no-intervention" effect is still valid under the LAGO design.
library(logisticLAGO)
The first function in this package is the expit function which calculates the success probabilities for the logistic regression model for a given vector of covariates $x$ and a vector of parameter values $\beta$. The success probability is calculated as $\frac{1}{(1 + e^{-\beta'x})}$. If the lower and upper limits of the components of the intervention package are given by $x_l$ and $x_u$ then the success probabilities for a true $\beta_{*}$ are calculated as
beta_true <- c(log(0.05), log(1.2), log(1.1), log(1.3)) x.l <- c(1, 10, 2) # Lower limits for X x.u <- c(4, 15, 15) # Upper limits for X prob_lower <- expit(beta = beta_true, x = x.l) prob_upper <- expit(beta = beta_true, x = x.u)
knitr::kable(matrix(c(prob_lower, prob_upper), nrow = 1), format = "html", digits = 3, caption = "Success probabilities", col.names = c("Outcome goal at lower limit", "Outcome goal at upper limit"))
If the desired outcome goal $\tilde{p}$ lies between the the probabilities i.e. the probabilities corresponding to $x_l$ and $x_u$, only then it is reasonable to expect that the LAGO optimization would yield an optimal intervention which attains the desired success goal, while minimizing the cost.
The logisticLAGO package can be used in two different settings, namely in a situation where a LAGO trial is ongoing and with the data being collected at any stage, the estimated optimal intervention package to be implemented in the next stage is desired and also in a situation before the starting of a LAGO study, where an idea about the optimal intervention package is desired. The two situations are considered separately over here.
At any stage of the study, once the data is available on the actual intervention received by the participants and their binary responses, they maybe used to estimate the parameters $\hat{\beta}$ using the accumulated data. The estimates $\hat{\beta}$, the intervention package rolled out at that stage $x$, along with information on their unit costs, upper and lower limits for the components of the intervention package may be used to estimate the optimal intervention package to be used in the next stage.
beta_est <- c( - 2.1350, 0.01899, 0.1989, 0.1273) x.l <- c(1, 10, 2) # Lower limits for X x.u <- c(4, 15, 15) # Upper limits for X x.start <- c(2.5, 12.5, 7) cost_lin = c(1, 8, 2.5) p_bar = 0.85 # Defining the desired outcome goal ## Running the LAGO optimization algorithm opt_lago = opt_int(cost = cost_lin, beta = beta_est, lower = x.l, upper = x.u, pstar = p_bar, starting.value = x.start)
The results from the optimization are summarized in the table below, which gives the estimated optimal intervention package and the success probabilities under the estimated optimal intervention package.
opt.x <- matrix(c(opt_lago$Optimum_Intervention, opt_lago$Obtained_p), nrow = 1) colnames(opt.x) <- c("$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$") knitr::kable(opt.x, format = "html", digits = 3, caption = "Table 1")
Given the data, if, however, the desired outcome goal cannot be met, the LAGO optimization returns the estimated optimal intervention package which minimizes the implementation cost, while trying to maximize the probability of observing success at the next stage with the estimated optimal intervention package.
beta_est <- c( - 3.7350, 0.01899, 0.1989, 0.1273) x.l <- c(1, 10, 2) # Lower limits for X x.u <- c(4, 15, 15) # Upper limits for X
Note that with the given estimated $\hat{\beta}$, the desired outcome goal $\tilde{p} = 0.85$ cannot be met, as can be seen from the table below:
prob_lower <- expit(beta = beta_est, x = x.l) prob_upper <- expit(beta = beta_est, x = x.u) # To be used in the table below x.start <- c(2.5, 12.5, 7) cost_lin = c(1, 8, 2.5) p_bar = 0.85 # Defining the desired outcome goal
knitr::kable(matrix(c(prob_lower, prob_upper), nrow = 1), format = "html", digits = 3, caption = "Success probabilities", col.names = c("Outcome goal at lower limit", "Outcome goal at upper limit"))
## Running the LAGO optimization algorithm opt_lago1 = opt_int(cost = cost_lin, beta = beta_est, lower = x.l, upper = x.u, pstar = p_bar, starting.value = x.start)
The results from the optimization are summarized in the table below, which gives the estimated optimal intervention package and the success probabilities under the estimated optimal intervention package.
opt.x <- matrix(c(opt_lago1$Optimum_Intervention, opt_lago1$Obtained_p), nrow = 1) colnames(opt.x) <- c("$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$") knitr::kable(opt.x, format = "html", digits = 3, caption = "Table 2")
The main purpose of this package is to consider the situation before starting of a LAGO study, wherein an investigator may be interested to know how the study would progress given the parameters of the trial. These parameters maybe obtained from historical or concurrent trials or maybe based on the best judgment of a subject matter expert. This package considers several LAGO designs including a single center design and multi-center design.
The simplest case is when the trial is designed to be conducted in a single center or location and is conducted over multiple stages. The number of stages (nstages) in the LAGO design, sample size per stage (sample.size), the unit costs for the intervention package components (cost.vec), the vector of minimum (lower) and maximum (upper) values of the components of the intervention package, desired outcome goal (prob), the best guess intervention effect (beta.true) and the initial value of intervention package (x0) along with the expected variation in rolling out the intervention package (icc) are used to simulate the estimated intervention package over the different stages, the estimated outcome goal and the estimated power of the test of "no-intervention" effect at the end of the study.
x.init = c(2.5, 12.5, 7) # Initial value interventions x.l = c(1, 10, 2) # Lower limits for X x.u = c(4, 15, 15) # Upper limits for X n = 50 # Sample size K = 3 # Number of stages cost_lin = c(1, 8, 2.5) # Costs p_bar = 0.9 # Desired outcome goal # True/best guess beta values beta = c(log(0.05), log(1.1), log(1.35), log(1.2)) sim_sc <- sc_lago(x0 = x.init, lower = x.l, upper = x.u, nstages = K, beta.true = beta, sample.size = n, icc = 0.1, cost.vec = cost_lin, prob = p_bar, B = 100, intercept = TRUE)
The stage wise results from the optimization are as below
results1 <- data.frame(sim_sc$xopt, sim_sc$p.opt.hat, c(sim_sc$power, "", "")) results1 <- cbind(c("Stage 1", "Stage 2", "Stage 3"), results1) rownames(results1) <- NULL colnames(results1) <- c("Stage", "$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") knitr::kable(results1, format = "html", digits = 3, caption = "Table 3")
The multi-center LAGO design extends the idea of the single center LAGO study to the case where the study is conducted in multiple centers or locations and pools the data together at each stage before proceeding with the optimization.
This is is a special case of multi-center LAGO with equal number of centers and equal sample size at each center at each stage of the design. The mc_lago function in this package can be used to simulate the data for this design and perform the optimization. Apart from the information as required by the single center design, the number of centers per stage (centers) and sample size per center per stage (sample.size) is used to simulate the optimal intervention package to be rolled out in each of the centers in the next stage such that the goals of the study are met. As before, estimated power for the test of "no-intervention" effect at the end of the study is also provided through simulations.
x.init = c(2.5, 12.5, 7) # Initial value interventions x.l = c(1, 10, 2) # Lower limits for X x.u = c(4, 15, 15) # Upper limits for X njk = 20 # Sample size per center per stage K = 3 # Number of stages J = 3 # Number of centers per stage cost_lin = c(1, 8, 2.5) # Costs p_bar = 0.9 # Desired outcome goal # True/best guess beta values beta = c(log(0.05), log(1.1), log(1.35), log(1.2)) # Running the Multi-center LAGO simulation study sim_mc <- mc_lago(x0 = x.init, lower = x.l, upper = x.u, beta.true = beta, nstages = K, centers = J, sample.size = njk, icc = 0.1, prob = p_bar, cost.vec = cost_lin)
The stage wise results from the multi-center LAGO optimization are as below
results2 <- data.frame(sim_mc$xopt, sim_mc$p.opt.hat, c(sim_mc$power, "", "")) results2 <- cbind(c("Stage 1", "Stage 2", "Stage 3"), results2) rownames(results2) <- NULL colnames(results2) <- c("Stage", "$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") knitr::kable(results2, format = "html", digits = 3, caption = "Table 4")
In any trial, it is intuitive to conduct the study in small number of centers at the beginning i.e. at the pilot stage and progressively increase the number of centers as the trial progresses from the pilot stage. The mc_lago_uc function considers different number of centers in each stage. However every center at any stage have equal samples. The function arguments are the same as the mc_lago function expect, that the argument centers is a vector of same dimension as nstages denoting the number of centers in each stage of the design. This function also differentiates between center and within center variations while considering simulations.
J_uc = c(3, 5, 10) # Number of centers per stage # Running the Multi-center LAGO simulation study sim_mc_uc <- mc_lago_uc(x0 = x.init, lower = x.l, upper = x.u, nstages = K, centers = J_uc, sample.size = njk, cost.vec = cost_lin, prob = p_bar, beta.true = beta, icc = 0.1, bcc = 0.15)
The results of which are shown below
results3 <- data.frame(sim_mc_uc$xopt, sim_mc_uc$p.opt.hat, c(round(sim_mc_uc$power, 2), "", "")) results3 <- cbind(c("Stage 1", "Stage 2", "Stage 3"), results3) rownames(results3) <- NULL colnames(results3) <- c("Stage","$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") knitr::kable(results3, format = "html", digits = 3, caption = "Table 5")
This is the general setup of any multi-center LAGO study, wherein apart from unequal number of centers at each stage, an unequal number of samples are considered at each center of a stage. At the beginning of a trial, apart from less number of centers, less number of participants are allocated to the trial and they are increased as the trail proceeds through the different stages and more data on the optimal intervention and response trend is available. The mc_lago_uc.us function considers different number of centers and sample size at each stage.
This function requires a list of sample sizes at each center of every stage.
J_uc = c(3, 5, 10) # Number of centers per stage sample_size = list() # Initialize sample size per center at each stage sample_size[[1]] = 20 # Sample size per center at stage 1 is 20 sample_size[[2]] = 25 # Sample size per center at stage 2 is 25 sample_size[[3]] = 30 # Sample size per center at stage 3 is 30 # Running the Multi-center LAGO simulation study sim_mc_uc.us <- mc_lago_uc.us(x0 = x.init, lower = x.l, upper = x.u, nstages = K, centers = J_uc, sample.size = sample_size, cost.vec = cost_lin, prob = p_bar, beta.true = beta, icc = 0.1, bcc = 0.15)
The results are summarized below.
results4 <- data.frame(sim_mc_uc.us$xopt, sim_mc_uc.us$p.opt.hat, c(round(sim_mc_uc.us$power, 2), "", "")) results4 <- cbind(c("Stage 1", "Stage 2", "Stage 3"), results4) rownames(results4) <- NULL colnames(results4) <- c("Stage","$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") knitr::kable(results4, format = "html", digits = 3, caption = "Table 6")
As a comparison of how the three methods compare, the results from the three different multi-center LAGO designs are summarized and for comparison they are based on the same total sample size. All other parameters remain the same as before except number of centers per stage and the sample size per center.
njk = 50 sim_mc <- mc_lago(x0 = x.init, lower = x.l, upper = x.u, beta.true = beta, nstages = K, centers = J, sample.size = njk, icc = 0.1, prob = p_bar, cost.vec = cost_lin) njk = 25 J_uc = c(3, 5, 10) # Number of centers per stage # Running the Multi-center LAGO simulation study sim_mc_uc <- mc_lago_uc(x0 = x.init, lower = x.l, upper = x.u, nstages = K, centers = J_uc, sample.size = njk, cost.vec = cost_lin, prob = p_bar, beta.true = beta, icc = 0.1, bcc = 0.15) J_uc = c(3, 5, 10) sample_size = list() # Initialize sample size per center at each stage sample_size[[1]] = 15 # Sample size per center at stage 1 is 20 sample_size[[2]] = 21 # Sample size per center at stage 2 is 25 sample_size[[3]] = 30 # Sample size per center at stage 3 is 30 # Running the Multi-center LAGO simulation study sim_mc_uc.us <- mc_lago_uc.us(x0 = x.init, lower = x.l, upper = x.u, nstages = K, centers = J_uc, sample.size = sample_size, cost.vec = cost_lin, prob = p_bar, beta.true = beta, icc = 0.1, bcc = 0.15)
The results are summarized as below:
results5 <- data.frame(sim_mc$xopt, sim_mc$p.opt.hat, c(sim_mc$power, "", "")) colnames(results5) <- c("$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") results6 <- data.frame(sim_mc_uc$xopt, sim_mc_uc$p.opt.hat, c(round(sim_mc_uc$power, 2), "", "")) colnames(results6) <- c("$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") results7 <- data.frame(sim_mc_uc.us$xopt, sim_mc_uc.us$p.opt.hat, c(round(sim_mc_uc.us$power, 2), "", "")) colnames(results7) <- c("$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") result = rbind(results5, results6, results7) rownames(result) = NULL result_full = cbind(c("Equal number of centers and samples", "", "", "Only uequal number of centers", "", "", "Unequal number of centers and samples", "", ""), rep(c("Stage 1", "Stage 2", "Stage 3"), 3), result) colnames(result_full) = c("Case"," Stage", "$\\hat{x}_1$", "$\\hat{x}_2$", "$\\hat{x}_3$", "$\\hat{p}$", "Power") knitr::kable(result_full, format = "html", digits = 3, caption = "Table of comparisons for the different multi-center LAGO designs")
For more information on logisticLAGO Package, please access the package documentations. Please feel free to contact the author.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.