knitr::opts_chunk$set(collapse = TRUE, comment  = "#>")

1. Introduction

multiarm provides functions to assist with the design of single- and multi-stage multi-arm clinical trials. This vignette describes the functionality available for single-stage designs. As will be discussed, the available functions allow for sample size determination, trial simulation, analytical operating characteristic calculation, and the production of several informative plots.

Note that an R Shiny graphical user interface is also available for designing single-stage trials; it can be accessed within R using multiarm::gui(), or online at https://mjgrayling.shinyapps.io/multiarm. Additional information on this app can be found in Grayling and Wason (2020).

This vignette will proceed by first precisely detailing the type of (single-stage) trial designs multiarm supports. Each of the available relevant functions will then be described, before several examples and useful further information is provided.

2. (Single-stage) multi-arm clinical trials

2.1. Design setting

multiarm supports the design of single-stage multi-arm clinical trials that use a many-to-one design. That is, a design where a single shared control treatment arm is compared in a pairwise manner to some number of experimental treatment arms, following the accrual of outcome data from a specified number of patients on each arm.

More formally, it supposes that outcomes $X_{ik}$ will be accrued from patients $i\in{1,\dots,n_k}$ on treatment arms $k\in{0,\dots,K}$. In Section 2.6, more information on the precise types of outcome that are currently supported is provided. Next, specifying that arm $k=0$ corresponds to the shared control arm, and arms $k\in{1,\dots,K}$ to the experimental arms, the hypotheses of interest are assumed to be: $$ H_k\ :\ \tau_k \le 0,\ \ k\in{1,\dots,K}.$$ Here, $\tau_k$ corresponds to a treatment effect for experimental arm $k\in{1,\dots,K}$, relative to the control arm. Thus, we test whether the treatment effects are negative, hoping that they will be positive (i.e., we assume positive $\tau_k$ corresponds to patient benefit). Later, for brevity, we will make use of the notation $\boldsymbol{\tau}=(\tau_1,\dots,\tau_K)^\top\in\mathbb{R}^K$. We will also make reference to the global null hypothesis, $H_G$, which we define to be the scenario with $\tau_1=\cdots=\tau_K=0$.

To test hypothesis $H_k$, we assume that a Wald test statistic, $z_k$, is computed: $$ z_k = \frac{\hat{\tau}k}{\sqrt{\text{Var}(\hat{\tau}_k)}} = \hat{\tau}_kI_k^{1/2},\ \ k\in{1,\dots,K}. $$ In what follows, we will use the notation $\boldsymbol{z}_k=(z_1,\dots,z_k)^\top\in\mathbb{R}^k$. Then, multiarm supports single-stage design in certain scenarios where $\boldsymbol{Z}_k$, the random pre-trial value of $\boldsymbol{z}_k$, has (at least asymptotically) a $k$-dimensional multivariate normal (MVN) distribution, with: \begin{align} \mathbb{E}(Z_l) &= \tau_lI_l^{1/2},\ l=1,\dots,k,\ \text{Cov}(Z{l},Z_{l}) &= 1,\ l\in{1,\dots,k},\ \text{Cov}(Z_{l_1},Z_{l_2}) &= I_{l_1}^{1/2}I_{l_2}^{1/2}\text{Cov}(\tau_{l_1},\tau_{l_2}),\ l_1 \neq l_2,\ l_1,l_2\in{1,\dots,k}. \end{align} Ultimately, to test the hypotheses, $\boldsymbol{z}K$ is converted to a vector of $p$-values, $\boldsymbol{p}=(p_1,\dots,p_K)^\top\in[0,1]^K$, via: $$ p_k = 1 - \Phi_1(z_k,0,1),\ \ k\in{1,\dots,K}.$$ Here, $\Phi_n{(a_1,\dots,a_n)^\top,\boldsymbol{\lambda},\Sigma}$ is the cumulative distribution function of an $n$-dimensional MVN distribution, with mean $\boldsymbol{\lambda}$ and covariance matrix $\Sigma$. Precisely: $$ \Phi_n{(a_1,\dots,a_n)^\top,\boldsymbol{\lambda},\Sigma} = \int{-\infty}^{a_1}\dots\int_{-\infty}^{a_n}\phi_n{\boldsymbol{x},\boldsymbol{\lambda},\Sigma}\mathrm{d}x_n\dots\mathrm{d}x_1, $$ where $\phi_n{\boldsymbol{x},\boldsymbol{\lambda},\Sigma}$ is the probability density function of an $n$-dimensional MVN distribution with mean $\boldsymbol{\lambda}$ and covariance matrix $\Sigma$, evaluated at vector $\boldsymbol{x}=(x_1,\dots,x_n)^\top$.

Finally, to determine whether to reject each null hypothesis, the $p_k$ are compared to a set of significance thresholds, which are specified based on a chosen MCC (see below), in combination with a nominated significance level $\alpha\in(0,1)$.

With the above, the principal goal of multiarm is to provide a set of functions that:

(a) help choose the significance thresholds against which the $p_k$ are compared; (b) help make choices on appropriate values for the $n_k$, $k\in{0,\dots,K}$, to achieve some desired level of statistical power (see Section 2.3); (c) help evaluate the statistical characteristics of a chosen design.

Before we describe the currently supported MCCs however, we will first describe the operating characteristics that can be evaluated by multiarm.

2.2. Operating characteristics

multiarm supports the evaluation of a wide selection of statistical operating characteristics that may be of interest when choosing a multi-arm trial design. To understand how these are mathematically calculated, it is useful to summarise the results of the trial, in terms of the outcome for each of the $K$ hypotheses, using a small table:

| | Reject | Do not reject | | -----------|---------:| ---------------:| | True null | $A$ | $B$ | | False null | $C$ | $D$ |

With this, multiarm can compute the following quantities for any nominated multi-arm design scenario and set of treatment effects (that is, the quantities are conditional on specification of various design factors; we omit this conditioning for brevity):

2.3. Multiple comparison corrections

2.3.1 Per-hypothesis error-rate control

The most simple possible solution for selecting the significance thresholds against which to compare the $p_k$, is to compare each to the chosen significance level $\alpha$. That is, to reject $H_k$ for $k\in{1,\dots,K}$ if $p_k \le \alpha$. This controls the PHER to $\alpha$, and corresponds to not utilising a so-called MCC. In multiarm, this approach can be chosen by setting correction = "none".

A potential problem with this, however, can be that the statistical operating characteristics of the resulting design may not be desirable (e.g., in terms of $FWER_{I1}$). For this reason, in general one may wish to make use of a MCC. Currently, multiarm supports the use of a variety of such MCCs, which aim to control either:

(a) the conventional FWER, $FWER_{I1}$ (with these techniques sub-divided into single-step, step-down, and step-up procedures) or; (b) the FDR.

2.3.2. Single-step family-wise error-rate control

These procedures test each of the $H_k$ against a common significance level, $\gamma\in(0,1)$ say, rejecting $H_k$ if $p_k \le \gamma$. The currently supported single-step procedures are:

Note that each of the above specify a $\gamma$ such that the maximum probability of incorrectly rejecting at least one of the null hypotheses $H_k$, $k\in{1,\dots,K}$, over all possible values of $\boldsymbol{\tau}\in\mathbb{R}^K$ is at most $\alpha$. This is referred to as strong control of $FWER_{I1}$.

2.3.3. Step-down family-wise error-rate control

These procedures work by first ranking the $p$-values from smallest to largest. We will refer to these ranked $p$-values by $p_{(1)},\dots,p_{(K)}$, with associated hypotheses $H_{(1)},\dots,H_{(K)}$. The $p_{(k)}$ are compared to a vector of significance levels $\boldsymbol{\gamma}=(\gamma_1,\dots,\gamma_K)\in(0,1)^K$. Precisely, the maximal index $k$ such that $p_{(k)}>\gamma_k$ is identified, and then $H_{(1)},\dots,H_{(k-1)}$ are rejected and $H_{(k)},\dots,H_{(K)}$ are not rejected. If $k=1$ then we do not reject any of the null hypotheses, and if no such $k$ exists then we reject all of the null hypotheses. The currently supported step-down procedures are:

Note that each of the above methods provide strong control of $FWER_{I1}$.

2.3.4. Step-up family-wise error-rate control

These procedures also work by first ranking the $p$-values from smallest to largest, and utilise a vector of significance levels $\boldsymbol{\gamma}$. However, here, the largest $k$ such that $p_{(k)} \le \gamma_k$ is identified. Then, we reject the hypotheses $H_{(1)},\dots,H_{(k)}$, and do not reject $H_{(k+1)},\dots,H_{(K)}$. Currently, one such method is supported:

This method also provides strong control of $FWER_{I1}$.

2.3.5. False discovery rate control

Whilst (most of) the above methods control $FWER_{I1}$ in the strong sense, it may be of interest instead to control the FDR. Currently, two methods that will control the FDR to at most $\alpha$ over all possible $\boldsymbol{\tau}\in\mathbb{R}^K$ are supported. They function in the same way as the step-up FWER control MCCs discussed above, but with:

2.4. Sample size determination

Powering a multi-arm trial is complex. Consequently, multiarm supports design determination for a variety of types of power. It requires values for interesting and uninteresting treatment effects to be specified, $\delta_1\in\mathbb{R}^+$ and $\delta_0\in(-\infty,\delta_1)$ respectively. With this, the following definitions are used:

Then, the sample size required by a design to control the following types of power to a specified level $1-\beta$ can be determined (chosen using the arguments beta and power):

2.5. Allocation ratios

One of the primary goals of multiarm is to aid the choice of $\boldsymbol{n}=(n_0,\dots,n_K)$. Currently, multiarm supports the determination of $\boldsymbol{n}$ by identifying a suitable $n_0$, and then setting $n_k=r_kn_0$ for $k\in{1,\dots,K}$, with $\boldsymbol{r}=(r_1,\dots,r_K)\in(0,\infty)^K$.

In turn, it allows $\boldsymbol{r}$ to be specified (via the argument ratio) in a variety of ways. It can be specified explicitly as a numeric vector, or alternatively can be determined in an optimal manner. Currently, multiarm supports the determination of $\boldsymbol{r}$ for the following optimality criteria:

The optimal allocation ratios are identified in the app using available closed-form solutions were possible, otherwise non-linear programming is employed.

2.6. Supported outcome variables

Currently, multiarm supports multi-arm trial design for scenarios in which the outcome variables are assumed to be either normally, Bernoulli, or Poisson distributed.

2.6.1. Normally distributed outcome variables

For the normal case, it assumes that $X_{ik}\sim N(\mu_k,\sigma_k^2)$, and that $\sigma_k^2$ is known for $k\in{0,\dots,K}$. Then, for each $k\in{1,\dots,K}$:

\begin{align} \tau_k &= \mu_k-\mu_0,\ \hat{\tau}k &= \frac{1}{n_k}\sum{l=1}^{n_k}x_{ik} - \frac{1}{n_0}\sum_{l=1}^{n_0}x_{i0},\ I_{k} &= \frac{1}{\frac{\sigma_0^2}{n_0} + \frac{\sigma_k^2}{n_k}}. \end{align}

Furthermore, $\boldsymbol{Z}_K$ has a multivariate normal distribution, and thus the operating characteristics can be computed exactly using multivariate normal integration. Additionally, the distribution of $\boldsymbol{Z}_K$ does not depend upon the values of the $\mu_k$, $k\in{0,\dots,K}$. Consequently, these parameters play no part in the inputs or outputs of functions in multiarm. In contrast, where relevant, the functions within multiarm request values for the $\sigma_k$ via the argument sigma.

Functions with names ending in _ss_norm support single-stage designs in which the outcome variables are assumed to be normally distributed.

2.6.2. Bernoulli distributed outcome variables

In this case, $X_{ik}\sim Bern(\pi_k)$ for response rates $\pi_k$, and for each $k\in{1,\dots,K}$:

\begin{align} \tau_k &= \pi_k-\pi_0,\ \hat{\tau}k &= \frac{1}{n_k}\sum{l=1}^{n_k}x_{ik} - \frac{1}{n_0}\sum_{l=1}^{n_0}x_{i0},\ I_{k} &= \frac{1}{\frac{\pi_0(1 - \pi_0)}{n_0} + \frac{\pi_k(1 - \pi_k)}{n_k}}. \end{align}

Thus, a problem for design determination becomes that the information levels are dependent on the unknown response rates. In practice, this is handled by setting:

$$ I_k = \frac{1}{\frac{\hat{\pi}0(1 - \hat{\pi}_0)}{n_0} + \frac{\hat{\pi}_k(1 - \hat{\pi}_k)}{n_k}},$$ for $\hat{\pi}_k = \sum{l=1}^{n_k}x_{ik}/n_k$, $k\in{0,\dots,K}$. This is the assumption made where relevant in multiarm. With this, $\boldsymbol{Z}_K$ is only asymptotically multivariate normal. Thus, in general it is important to validate operating characteristics evaluated using multivariate normal integration via simulation.

In addition, note that the above problem also means that the operating characterstics under $HG$, $HA$, and the $LFC_k$ are not unique without further restriction. Thus, to achieve uniqueness, multiarm requires a value be specified for $\pi_0$ (pi0) for use in the definition of these scenarios. And for this reason, the inputs and outputs of functions supporting Bernoulli outcomes make no reference to the $\tau_k$, and work instead directly in terms of the $\pi_k$. Finally, it also means that to determine $A$-, $D$-, or $E$-optimised allocation ratios, a specific set of values for for the $\pi_k$ must be assumed (see ratio_scenario in des_ss_bern()).

In this case, we should also ensure that $\delta_1\in(0,1)$ and $\delta_0\in(-\pi_0,\delta_1)$, for the assumed value of $\pi_0$, since $\pi_k\in[0,1]$ for $k\in{1,\dots,K}$.

Functions with names ending in _ss_bern support single-stage designs in which the outcome variables are assumed to be Bernoulli distributed.

2.6.3. Poisson distributed outcome variables

In this case, $X_{ik}\sim Po(\lambda_k)$ for event rates $\lambda_k$, and for each $k\in{1,\dots,K}$:

\begin{align} \tau_k &= \lambda_k-\lambda_0,\ \hat{\tau}k &= \frac{1}{n_k}\sum{l=1}^{n_k}x_{ik} - \frac{1}{n_0}\sum_{l=1}^{n_0}x_{i0},\ I_{k} &= \frac{1}{\frac{\lambda_0}{n_0} + \frac{\lambda_k}{n_k}}. \end{align}

Thus, like the Bernoulli situation above, a problem for design determination becomes that the information levels are dependent on the unknown event rates. Similarly, this is handled in multiarm by setting:

$$ I_k = \frac{1}{\frac{\hat{\lambda}0}{n_0} + \frac{\hat{\lambda}_k}{n_k}},$$ for $\hat{\lambda}_k = \sum{l=1}^{n_k}x_{ik}/n_k$, $k\in{0,\dots,K}$. Again, $\boldsymbol{Z}_K$ is then only asymptotically multivariate normal, and so once more it is important to validate operating characteristics evaluated using multivariate normal integration via simulation.

Furhermore, this problem again means that the operating characterstics under $HG$, $HA$, and the $LFC_k$ are not unique without further restriction. To achieve uniqueness, multiarm requires a value be specified for $\lambda_0$ (lambda0) for use in the definition of these scenarios. For this reason, the inputs and outputs of functions supporting Poisson outcomes make no reference to the $\tau_k$, and work instead directly in terms of the $\lambda_k$. Finally, it also means that to determine $A$-, $D$-, or $E$-optimised allocation ratios, a specific set of values for for the $\lambda_k$ must be assumed (see ratio_scenario in des_ss_pois()).

In this case, we should also ensure that $\delta_1\in(0,Inf)$ and $\delta_0\in[-\lambda_0,\delta_1)$, for the assumed value of $\lambda_0$, since $\lambda_k\in[0,Inf)$ for $k\in{1,\dots,K}$.

Functions with names ending in _ss_pois support single-stage designs in which the outcome variables are assumed to be Poisson distributed.

3. Functions

3.1. des_ss_bern(), des_ss_norm(), and des_ss_pois()

des_ss_bern(), des_ss_norm(), and des_ss_pois() are amongst the primary functions within multiarm. They are used to determine single-stage multi-arm clinical trial designs attaining a desired level of a specified type of power, for Bernoulli, normally, and Poisson distributed outcomes respectively.

3.3. build_ss_bern(), build_ss_norm(), and build_ss_pois()

des_ss_bern(), des_ss_norm(), and des_ss_pois() search for designs based upon the chosen input parameters. However, in certain circumstances there may be a specific design that is of interest. build_ss_bern(), build_ss_norm(), and build_ss_pois() can be used to construct such designs, specifically for subsequent use with the other functions available in multiarm.

3.4. plot.multiarm_des_ss_bern(), plot.multiarm_des_ss_norm(), and plot.multiarm_des_ss_pois()

multiarm can produce a variety of plots for the outputs from build_ss_bern(), build_ss_norm(), build_ss_pois(), des_ss_bern(), des_ss_norm(), and des_ss_pois() via the S3 functions plot.multiarm_des_ss_bern(), plot.multiarm_des_ss_norm(), and plot.multiarm_des_ss_pois().

3.5. opchar_ss_bern(), opchar_ss_norm(), and opchar_ss_pois()

Whilst build_ss_bern(), build_ss_norm(), build_ss_pois(), des_ss_bern(), des_ss_norm(), and des_ss_pois() return a small summary of the statistical operating characteristics of a design, it may occasionally be of interest to evaluate the operating characteristics of a design in a wider range of settings. opchar_ss_bern(), opchar_ss_norm(), and opchar_ss_pois() are designed for this purpose.

3.6. sim_ss_bern(), sim_ss_norm() and sim_ss_pois()

Each of the functions above rely upon multivariate normal integration for operating characteristic evaluation. In certain circumstances, most likely when either (a) $K$ is large or (b) analytical evaluations need to be validated, recourse to simulation is useful. sim_ss_bern(), sim_ss_norm(), and sim_ss_pois() are designed to be helpful in these settings.

4. Examples

4.1. Example 1

We demonstrate the basic functionality provided in multiarm. First, identify a design using des_ss_norm():

library(multiarm)
des <- des_ss_norm(integer = TRUE)

Note that the argument integer has been used because we will make a call to sim_ss_norm() later on.

We can then plot power curves with plot.multiarm_des_ss_norm() as follows:

plot(des)

des_ss_norm() contains a summary of the operating characteristics of the identified design:

des$opchar

We can validate these using sim_ss_norm() with:

sim_ss_norm(des)$sim

4.2. Example 2

Now, lets make use of some of the additional functionality from multiarm.

First, let's find a design for a scenario with $K=3$, using the Holm-Bonferroni correction, desiring to control the disjunctive power, choosing the allocation ratios for $D$-optimality, in a scenario where the standard deviations differ across the arms:

des <- des_ss_norm(K          = 3,
                   sigma      = c(0.5, 1, 1.5, 2),
                   ratio      = "D",
                   correction = "holm_bonferroni",
                   power      = "disjunctive",
                   integer    = TRUE)

Thus the total required sample size by the design is:

des$N

And, the allocation ratios are:

des$ratio

Next, note that the operating characeristics of the truly optimal design are:

des$opchar

We can validate these with:

sim_ss_norm(des)$sim

Or recreate them with opchar_ss_norm() via:

opchar_ss_norm(des)$opchar

And, as earlier, we can produce power-curve plots with:

plot(des)

References

Grayling MJ, Wason JMS (2020) A web application for the design of multi-arm clinical trials. BMC Cancer 20:80. DOI: 10.1186/s12885-020-6525-0. PMID: 32005187.



mjg211/multiarm documentation built on Jan. 19, 2024, 8:21 a.m.