Description Usage Arguments Value Output Note Author(s) Examples

View source: R/optimize_design.R

Optimizes two-stage adaptive enrichment design using sparse linear programming

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ```
optimize_design(
subpopulation.1.proportion,
total.alpha,
data.generating.distributions,
stage.1.sample.sizes,
stage.2.sample.sizes.per.enrollment.choice,
objective.function.weights,
power.constraints,
type.of.LP.solver = get_default_solver(),
discretization.parameter = c(1, 1, 10),
number.cores = min(parallel::detectCores(), 30),
ncp.list = c(),
list.of.rectangles.dec = c(),
LP.iteration = 1,
prior.covariance.matrix = diag(2) * 0,
LP.solver.path = c(),
loss.function = c(),
cleanup.temporary.files = TRUE
)
``` |

`subpopulation.1.proportion` |
Proportion of overall population in subpopulation 1. Must be between 0 and 1. |

`total.alpha` |
Familywise Type I error rate (1-sided) |

`data.generating.distributions` |
Matrix encoding data generating distributions (defined in terms of treatment effect pairs and outcome variances) used to define power constraints and objective function; each row defines the pair (Delta_1,Delta_2) of subpopulation 1 and 2 average treatment effects, followed by outcome variances for the four combinations of subpouplation (1 and 2) by study arm (0 and 1). |

`stage.1.sample.sizes` |
Vector with 2 entries representing stage 1 sample sizes for subpopulations 1 and 2, respectively |

`stage.2.sample.sizes.per.enrollment.choice` |
Matrix with number.choices.end.of.stage.1 rows and 2 columns, where the (i,j) entry represents the stage 2 sample size under enrollment choice i for subpopulation j. |

`objective.function.weights` |
Vector with length equal to number of rows of population.parameters, representing weights used to define the objective function |

`power.constraints` |
Matrix with same number of rows as population.parameters (each representing a data generating distribution) and three columns corresponding to the required power to reject (at least) H_01, H_02, H_0C, respectively. |

`type.of.LP.solver` |
"matlab", "cplex", "GLPK", or "gurobi" The linear program solve that you want to use; assumes that you have installed this already and that path is set |

`discretization.parameter` |
vector with 3 positive-valued components representing initial discretization of decision region, rejection regions, and grid representing Type I error constraints; first component is edge length of each square in the decision region, second component is edge length of each square in the rejection regions, and third component determines the number of Type I error constraints equally spaced on the boundary of the null space for each null hypothesis. The third component is only used if ncp.list below is left unspecified. |

`number.cores` |
the number of cores available for parallelization using the parallel R package |

`ncp.list` |
list of pairs of real numbers representing the non-centrality parameters to be used in the Type I error constraints; if list is empty, then default list is used. |

`list.of.rectangles.dec` |
list of rectangles representing decision region partition, encoded as a list with each element of the list having fields $lower_boundaries (pair of real numbers representing coordinates of lower left corner of rectangle), $upper_boundaries (pair of real numbers representing upper right corner of rectangle), $allowed_decisions (subset of stage.2.sample.sizes.per.enrollment.choice representing which decisions allowed if first stage z-statistics are in corresponding rectangle; default is entire list stage.2.sample.sizes.per.enrollment.choice), $preset_decision (indicator of whether the decision probabilities are hard-coded by the user; default is 0), $d_probs (empty unless $preset_decision==1, in which case it is a vector representing the probabilities of each decision); if list.or.rectangles.dec is empty, then a default partition is used based on discretization.parameter. |

`LP.iteration` |
positive integer used in file name to store output; can be used to avoid overwriting previous computations |

`prior.covariance.matrix` |
2x2 positive semidefinite matrix representing the covariance corresponding to each component of the mixture of multivariate normals prior distribution (used only in defining the objective function); the default is the matrix of all 0's, corresponding to the prior being a mixture of point masses. If separate covariance matrices are desired for each component of the mixture prior distribution, a list of such 2x2 matrices of same length as the number of rows in data.generating.distributions can be provided instead of a single matrix; this option allows the user to specify anyfinite mixture of point masses (by setting the corresponding covariance matrices to have all 0's) and bivariate normal distributions. |

`LP.solver.path` |
path (i.e., directory) where LP.solver is installed; e.g., if type.of.LP.solver=="cplex" then LP.solver.path is directory where cplex is installed |

`loss.function` |
a matrix of same length as the number of enrollment choices for stage 2, i.e., same length as the number of rows in stage.2.sample.sizes.per.enrollment.choice. Each component d of this vector represents the loss corresponding to enrollment choice d; if nothing is specified, the default is to use the total sample size under each enrollment choice, so that the objective function represents expected sample size. An alternative choice would be, for example, the trial duration under each enrollment choice, or some combination of sample size and trial duration. |

`cleanup.temporary.files` |
TRUE/FALSE indicates whether temporary files generated during problem solving process should be deleted or not after termination; set to FALSE for debugging purposes only. |

An optimized policy is returned, consisting of the following elements (defined in the paper): S1, A1, S2, A2 (sets of states and actions) and the optimized policy (pi_1, pi_2). Also, additional information related to the optimized design is saved as "optimized_design<k>.rdata", where <k> is the user-defined iteration number (LP.iteration).

The software computes an optimized design and saves it as "optimized_design<k>.rdata", where <k> is the user-defined iteration number (LP.iteration). E.g., if one sets LP.iteration=1, then the optimized design is saved as "optimized_design1.rdata". That file can be opened in R and contains the following 6 items: input.parameters (the inputs passed to the optimized_design function) list.of.rectangles.dec (the decision rectangle partition of R^2) list.of.rectangles.mtp (the multiple testing procedure partition of R^2) ncp.active.FWER.constraints (the active familywise Type I error constraints in the optimized design, obtained using the dual solution to the linear program) ncp.list (the complete list of familywise Type I error constraints input to the linear program solver) sln (the solution to the linear program; sln$val is the expected sample size; sln$status, if either 1 or 5, indicates that a feasible solution was found and other wise the problem was infeasible or no solution was found; sln$z is the actual solution as a vector)

See inst/examples/example3.2reduced which is a simplified example that can be run in 4 minutes using the GLPK solver using a 4 core, 2.8 GHz processor on a Macbook laptop, which involves solving a modified version of the problem from Example 3.2 as described in Section 5.2 of the manuscript; the main modifications are that we use a coarsened partition of the decision region and rejection regions in order to speed up the computation.

Michael Rosenblum, Ethan Fang, Han Liu

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | ```
library(AdaptiveDesignOptimizerSparseLP)
#Install R package if not already done so using the following command:
#remotes::install_github("mrosenblum/AdaptiveDesignOptimizerSparseLP")
# Load R package if not already done so by the following command:
# library(AdaptiveDesignOptimizerSparseLP)
# For reproducibility, set the random number generator seed:
set.seed(32515)
# For reproducibility, set the random number generator seed:
set.seed(32515)
# Problem Inputs:
# For illustration below, we set all problem inputs based on Example 3.2.
# However, our software is general in that users can set the
# inputs based on their own problems, and then run our trial design
# optimizer.
# The proportion of the population in subpopulation 1 is 1/2:
subpopulation.1.proportion = 0.5
# Sample Sizes:
# We set n=200 in our adaptive design template n^(1b).
# This corresponds to total sample size 100 in stage 1,
# and total sample size 100 in stage 2 if both subpopulations get enrolled
# in stage 2.
# Stage 1 Sample Size for each subpopulation:
# Since p1=1/2 and total sample size in stage 1 is 100,
# we set the stage 1 sample size in each subpopulation to be 50 participants.
# This is encoded as follows (where first entry is subpopulation 1 sample size,
# second entry is subpopulation 2 sample size):
stage.1.sample.sizes = c(50, 50)
# In general, one can specify any finite number of sample size options for
# stage 2 enrollment. However, at some point the
# resulting problem size will be too large to solve, depending on the
# computational resources one has available.
# Here we consider 4 options as in adaptive design template n^(1b)
# (see Fig. 1b in the manuscript).
# Having set n=200 in adaptive design template n^(1b) as described above,
# this corresponds to the following four choices for stage 2 enrollment,
# where each row corresponds to a different stage 2 enrollment choice,
# encoded as (subpopulation 1 sample size, subpopulation 2 sample size)
stage.2.sample.sizes.per.enrollment.choice = matrix(
c(50, 50, # Stage 2: enroll 50 from each subpopulation
0, 0, # Stop trial after stage 1
150, 0, # Stage 2: enroll 150 from subpopulation 1 and 0 from subpopulation 2
0, 150), # Stage 2: enroll 0 from subpopulation 1 and 150 from subpopulation 2
nrow = 4,
ncol = 2,
byrow = TRUE,
dimnames = list(
c(),
c(
"Subpopulation1Stage2SampleSize", # Note: these labels are optional
"Subpopulation2Stage2SampleSize" # and just for illustration
)
)
)
# We next set the minimum, clinically meaningful treatment effect size.
# to beDelta_min = 4*qnorm(0.95)/sqrt(n) = 0.465. We explain this choice
# below, which was made to match the distributions in Example 3.2 and
# the non-centrality parameter choice in Section 5.1.
Delta_min = 0.465
# The data generating distributions where the trial design will be
# evaluated are input next, encoded as the matrix data.generating.distributions
# Each row of the matrix corresponds to a different data generating distribution.
# The user decides on how many rows to put; we use 4 rows below.
# Each column gives a feature of the corresponding distribution, where
# the first two columns represent (Delta_1,Delta_2)
# and the last 4 columns are the outcome variances
# sigma_10^2, sigma_11^2, sigma_20^2, sigma_21^2,
# where sigma_sa^2 is the outcome variance for subpopulation s and arm a.
# In our example, we set each variance equal to 1.
data.generating.distributions = matrix(
data = c(
# Data generating distribution with (Delta_1,Delta_2)=(0,0):
0,0,1,1,1,1,
# Data generating distribution with (Delta_1,Delta_2)=(Delta_min,0):
Delta_min,0,1,1,1,1,
# Data generating distribution with (Delta_1,Delta_2)=(0,Delta_min):
0,Delta_min,1,1,1,1,
# Data generating distribution with (Delta_1,Delta_2)=(Delta_min,Delta_min):
Delta_min,Delta_min,1,1,1,1
),
nrow = 4,
ncol = 6,
byrow = TRUE,
dimnames = list(
c(),
c(
"Delta1", # Note: these labels are optional
"Delta2",
"Variance10",
"Variance11",
"Variance20",
"Variance21"
)
)
)
# The above choice of Delta_min=0.465, sigma_sa=1 for each subpopulation s and arm a,
# and n=200 were selected in order for the non-centrality parameter (defined in #'
# Section 5.1) to equal 2.33, so that our problem formulation here matches
# Example 3.2 under the setup in Section 5 of the manuscript.
# The difference here is that we use a
# coarse discretization so that the computation runs relatively quickly on a single
# processor using the open-source GLPK solver for illustration. We used multiple
# processors, used Cplex (which is substantially faster than GLPK for our problems),
# and ran our computations for longer durations to solve the problems in the paper;
# this allowed finer discretizations.
# Required Familywise Type I error:
total.alpha = 0.05
# Power Constraints:
# The power constraints are represented by a matrix power.constraints
# with the same number ofrows as data.generating.distributions.
# The 3 columns correspond to the following:
# Column 1: Power for H01; Column 2: Power for H02; Column 3: Power for H0C.
# Each row of power.constraints represents the minimum required power to reject
# H01, H02, H0C, respectively, when the data generating distribution is
# the corresponding row of data.generating.distributions.
# There are four power constraints below (the first of which is vacuous)
# For illustration these are set lower than typical (only 60%), since
# we use a coarse discretization that cannot achieve much higher power.
power.constraints = matrix(
c(
# No power requirements under first data generating distribution:
0,0,0,
# 60% power required for rejecting H01 under 2nd data generating distributi#' on:
0.6,0,0,
# 60% power required for rejecting H02 under 3rd data generating distributi#' on:
0,0.6,0,
# 60% power required for rejecting H0C under 4th data generating distributi#' on:
0,0,0.6
),
nrow = 4,
ncol = 3,
byrow = TRUE,
dimnames = list(c(), c("PowerH01", "PowerH02", "PowerH0C"))
)
# Objective Function:
# The prior distribution Lambda used to define the objective function
# is a mixture of k distributions, where k is the number of rows of
# data.generating.distributions, and with weights
# defined by objective.function.weights (which must sum to 1).
# We next describe how each component distribution of the mixture is encoded.
# Each component distribution is a bivariate normal distribution
# (where we allow degenerate distributions, e.g., a point mass).
# Each component distribution is centered at (Delta_1,Delta_2) from
# the corresponding row of data.generating.distributions.
# The covariance matrix of the each component distribution is defined by
# prior.covariance.matrix, e.g., we set
prior.covariance.matrix = diag(2) # the 2x2 identity matrix
# When prior.covariance.matrix is input as a single 2x2 matrix, it is assumed #' that
# this is the same covariance matrix for each component distribution. In this #' way,
# if we also set
objective.function.weights = 0.25 * c(1, 1, 1, 1)
# then we obtain the Lambda corresponding to the objective function in Example 3.2.
# The user can instead decide to set each component distribution
# to have a different covariance matrix,
# which can be input by setting prior.covariance.matrix to a list of k
# 2x2 matrices, where k is the number of rows of data.generating.distributions.
# The special case of point masses can be encoded by setting the covariance matrix
# (or matrices) to contain all 0's, e.g., if we set
# prior.covariance.matrix to the 2x2 matrix of all 0's, then Lambda
# in our example here would correspond to an equally weighted mixture of
# 4 point masses as in Example 3.1. One does not have to use all
# of the rows of data.generating.distructions, e.g., one could set some
# components of objective.function.weights to 0, which effectively
# ignores those rows in defining the objective function prior Lambda.
# discretization.parameter sets how fine/coarse the rectangle partitions are
# for the decision and rejection regions (which are exactly as in paragraph 2 in Section 5.2
# except that here we allow to multiply the side-lengths of squares by constants defined next),
# and also how fine the grid G of Type I error constraints should be.
# The first component is twice the side-length of squares in the box [-3,3]x[-3,3] and equal to
# the side-lengths of squares in the remaining region of [-6,6] x [-6,6];
# in the decision region partition;
# the second component is the side-length of squares in the rejection region partitions;
# the third component (after we multiply by theconstant 54)
# gives the number of grid points in G.
# There is also the option to set more complex partitions and grids; please
# run help(optimize_design) and look at the arguments: list.of.rectangles.dec
# and ncp.list for explanation of available options for this.
# For illustration below,
# we set a coarse distribution since the corresponding problem can be generated
# and solved in about 4 minutes on 4 core, 2.8 GHz processor on a Macbook laptop.
# Below we use only 2 cores, so the run time should be about 8 minutes.
discretization.parameter = c(3, 3, 1)
number.cores = min(parallel::detectCores(), 2)
if (requireNamespace("Rglpk", quietly = TRUE)) {
type.of.LP.solver = "glpk"
# The following call to the main function optimize_design in our package
# solves the adaptive design optimization problem using the above inputs:
optimized.policy <- optimize_design(
subpopulation.1.proportion,
total.alpha,
data.generating.distributions,
stage.1.sample.sizes,
stage.2.sample.sizes.per.enrollment.choice,
objective.function.weights,
power.constraints,
type.of.LP.solver,
discretization.parameter,
number.cores,
prior.covariance.matrix = prior.covariance.matrix
)} else {print("GLPK solver not found; please either install GLPK or set type.of.LP.solver to be 'matlab', 'gurobi', or 'cplex'.")}
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.