Single Binary Endpoint"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 7,
  fig.height = 4.5,
  fig.path = "figures/single-bin-"
)
library(BayesianQDM)

Motivating Scenario

We use a hypothetical Phase IIa proof-of-concept (PoC) trial in moderate-to-severe rheumatoid arthritis (RA). The investigational drug is compared with placebo in a 1:1 randomised controlled trial with $n_t = n_c = 12$ patients per group.

Endpoint: ACR20 response rate (proportion of patients achieving $\ge 20\%$ improvement in ACR criteria).

Clinically meaningful thresholds (posterior probability):

Null hypothesis threshold (predictive probability): $\theta_{\mathrm{NULL}} = 0.10$.

Decision thresholds: $\gamma_{\mathrm{go}} = 0.80$ (Go), $\gamma_{\mathrm{nogo}} = 0.20$ (NoGo).

Observed data (used in Sections 2–4): $y_t = 8$ responders out of $n_t = 12$ (treatment); $y_c = 3$ responders out of $n_c = 12$ (control).


1. Bayesian Model: Beta-Binomial Conjugate

1.1 Prior Distribution

Let $\pi_j$ denote the true response rate in group $j$ ($j = t$: treatment, $j = c$: control). We model the number of responders as

$$y_j \mid \pi_j \;\sim\; \mathrm{Binomial}(n_j,\, \pi_j),$$

and place independent Beta priors on each rate:

$$\pi_j \;\sim\; \mathrm{Beta}(a_j,\, b_j),$$

where $a_j > 0$ and $b_j > 0$ are the shape parameters.

The Jeffreys prior corresponds to $a_j = b_j = 0.5$, giving a weakly informative, symmetric prior on $(0, 1)$. The uniform prior corresponds to $a_j = b_j = 1$.

1.2 Posterior Distribution

By conjugacy of the Beta-Binomial model, the posterior after observing $y_j$ responders among $n_j$ patients is

$$\pi_j \mid y_j \;\sim\; \mathrm{Beta}(a_j^,\, b_j^),$$

where the posterior shape parameters are

$$a_j^ = a_j + y_j, \qquad b_j^ = b_j + n_j - y_j.$$

The posterior mean and variance are

$$E[\pi_j \mid y_j] = \frac{a_j^}{a_j^ + b_j^}, \qquad \mathrm{Var}(\pi_j \mid y_j) = \frac{a_j^\, b_j^}{(a_j^ + b_j^)^2\,(a_j^ + b_j^* + 1)}.$$

1.3 Posterior of the Treatment Effect

The treatment effect is $\theta = \pi_t - \pi_c$. Since the two groups are independent, $\theta \mid y_t, y_c$ follows the distribution of the difference of two independent Beta random variables:

$$\pi_t \mid y_t \;\sim\; \mathrm{Beta}(a_t^,\, b_t^), \qquad \pi_c \mid y_c \;\sim\; \mathrm{Beta}(a_c^,\, b_c^).$$

There is no closed-form CDF for $\theta = \pi_t - \pi_c$, so the posterior probability

$$P(\theta > \theta_0 \mid y_t, y_c) = P(\pi_t - \pi_c > \theta_0 \mid y_t, y_c)$$

is evaluated by the convolution integral implemented in pbetadiff():

$$P(\pi_t - \pi_c > \theta_0) = \int_0^1 F_{\mathrm{Beta}(a_c^, b_c^)}!\left(x - \theta_0\right)\, f_{\mathrm{Beta}(a_t^, b_t^)}!(x)\; dx,$$

where $F_{\mathrm{Beta}(\alpha,\beta)}$ is the Beta CDF and $f_{\mathrm{Beta}(\alpha,\beta)}$ is the Beta PDF. This one-dimensional integral is evaluated by adaptive Gauss-Kronrod quadrature via stats::integrate().


2. Posterior Predictive Probability

2.1 Beta-Binomial Predictive Distribution

Let $\tilde{y}_j \mid \pi_j \sim \mathrm{Binomial}(m_j, \pi_j)$ be future trial counts with $m_j$ future patients. Integrating out $\pi_j$ over its Beta posterior gives the Beta-Binomial predictive distribution:

$$P(\tilde{y}_j = k \mid y_j) = \binom{m_j}{k} \frac{B(a_j^ + k,\; b_j^ + m_j - k)}{B(a_j^,\; b_j^)}, \quad k = 0, 1, \ldots, m_j,$$

where $B(\cdot, \cdot)$ is the Beta function.

2.2 Posterior Predictive Probability of Future Success

The posterior predictive probability that the future treatment difference exceeds $\theta_{\mathrm{NULL}}$ is

$$P!\left(\frac{\tilde{y}_t}{m_t} - \frac{\tilde{y}_c}{m_c}

\theta_{\mathrm{NULL}} \;\Big|\; y_t, y_c\right) = \sum_{k_t=0}^{m_t} \sum_{k_c=0}^{m_c} \mathbf{1}!\left[\frac{k_t}{m_t} - \frac{k_c}{m_c} > \theta_{\mathrm{NULL}}\right]\, P(\tilde{y}_t = k_t \mid y_t)\, P(\tilde{y}_c = k_c \mid y_c).$$

This double sum over all $(m_t + 1)(m_c + 1)$ outcome combinations is evaluated exactly by pbetabinomdiff().


3. Study Designs

3.1 Controlled Design

Standard parallel-group RCT with observed data from both groups. The posterior shape parameters for group $j$ are

$$a_j^ = a_j + y_j, \qquad b_j^ = b_j + n_j - y_j,$$

and the posterior probability is $P(\pi_t - \pi_c > \theta_0 \mid y_t, y_c)$ computed via pbetadiff().

Posterior probability at TV and MAV:

# P(pi_t - pi_c > TV  | data)
p_tv <- pbayespostpred1bin(
  prob = 'posterior', design = 'controlled', theta0 = 0.20,
  n_t = 12, n_c = 12, y_t = 8, y_c = 3,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  m_t = NULL, m_c = NULL, z = NULL,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  lower.tail = FALSE
)

# P(pi_t - pi_c > MAV | data)
p_mav <- pbayespostpred1bin(
  prob = 'posterior', design = 'controlled', theta0 = 0.05,
  n_t = 12, n_c = 12, y_t = 8, y_c = 3,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  m_t = NULL, m_c = NULL, z = NULL,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  lower.tail = FALSE
)

cat(sprintf("P(theta > TV  | data) = %.4f  -> Go  criterion (>= %.2f): %s\n",
            p_tv,  0.80, ifelse(p_tv  >= 0.80, "YES", "NO")))
cat(sprintf("P(theta <= MAV | data) = %.4f  -> NoGo criterion (>= %.2f): %s\n",
            1 - p_mav, 0.20, ifelse((1 - p_mav) >= 0.20, "YES", "NO")))
cat(sprintf("Decision: %s\n",
            ifelse(p_tv >= 0.80 & (1 - p_mav) < 0.20, "Go",
                   ifelse(p_tv < 0.80 & (1 - p_mav) >= 0.20, "NoGo",
                          ifelse(p_tv >= 0.80 & (1 - p_mav) >= 0.20, "Miss", "Gray")))))

Posterior predictive probability (future Phase III: $m_t = m_c = 40$):

p_pred <- pbayespostpred1bin(
  prob = 'predictive', design = 'controlled', theta0 = 0.10,
  n_t = 12, n_c = 12, y_t = 8, y_c = 3,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  m_t = 40, m_c = 40, z = NULL,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  lower.tail = FALSE
)
cat(sprintf("Predictive probability (m_t = m_c = 40) = %.4f\n", p_pred))

Vectorised computation over all possible outcomes:

The function accepts vectors for y_t and y_c, enabling efficient computation of the posterior probability for every possible outcome pair $(y_t, y_c) \in {0,\ldots,n_t} \times {0,\ldots,n_c}$.

grid <- expand.grid(y_t = 0:12, y_c = 0:12)
p_all <- pbayespostpred1bin(
  prob = 'posterior', design = 'controlled', theta0 = 0.20,
  n_t = 12, n_c = 12, y_t = grid$y_t, y_c = grid$y_c,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  m_t = NULL, m_c = NULL, z = NULL,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  lower.tail = FALSE
)
# Show results for a selection of outcome pairs
sel <- data.frame(y_t = grid$y_t, y_c = grid$y_c, P_gt_TV = round(p_all, 4))
head(sel[order(-sel$P_gt_TV), ], 10)

3.2 Uncontrolled Design (Single-Arm)

When no concurrent control group is enrolled, a hypothetical control responder count $z$ is specified from historical knowledge. The control posterior is then

$$\pi_c \mid z \;\sim\; \mathrm{Beta}(a_c + z,\; b_c + n_c - z),$$

where $n_c$ is the hypothetical control sample size and $z$ is the assumed number of hypothetical responders ($0 \le z \le n_c$). Only the treatment group data $y_t$ are observed; $y_c$ is set to NULL.

# Hypothetical control: z = 2 responders out of n_c = 12
p_unctrl <- pbayespostpred1bin(
  prob = 'posterior', design = 'uncontrolled', theta0 = 0.20,
  n_t = 12, n_c = 12, y_t = 8, y_c = NULL,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  m_t = NULL, m_c = NULL, z = 2L,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  lower.tail = FALSE
)
cat(sprintf("P(theta > TV | data, uncontrolled, z=2) = %.4f\n", p_unctrl))

Effect of the hypothetical control count: varying $z$ shows the sensitivity to the assumed control rate.

z_seq <- 0:12
p_z   <- sapply(z_seq, function(z) {
  pbayespostpred1bin(
    prob = 'posterior', design = 'uncontrolled', theta0 = 0.20,
    n_t = 12, n_c = 12, y_t = 8, y_c = NULL,
    a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
    m_t = NULL, m_c = NULL, z = z,
    ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
    alpha0e_t = NULL, alpha0e_c = NULL,
    lower.tail = FALSE
  )
})
data.frame(z = z_seq, P_gt_TV = round(p_z, 4))

3.3 External Design (Power Prior)

Historical or external data are incorporated via a power prior with borrowing weight $\alpha_{0ej} \in (0, 1]$. The augmented prior shape parameters are

$$a_j^ = a_j + \alpha_{0ej}\, y_{ej}, \qquad b_j^ = b_j + \alpha_{0ej}\,(n_{ej} - y_{ej}),$$

where $n_{ej}$ and $y_{ej}$ are the external sample size and responder count for group $j$ (corresponding to ne_t/ne_c, ye_t/ye_c, and alpha0e_t/alpha0e_c). These serve as the prior for the current PoC data, yielding a closed-form posterior:

$$\pi_j \mid y_j \;\sim\; \mathrm{Beta}!\left(a_j^ + y_j,\; b_j^ + n_j - y_j\right).$$

Setting $\alpha_{0ej} = 1$ corresponds to full borrowing (the external data are treated as if they came from the current trial); $\alpha_{0ej} \to 0$ recovers the result from the current data alone with the original prior.

Here, external data for both groups (external design): $n_{et} = 15$, $y_{et} = 5$, $n_{ec} = 15$, $y_{ec} = 4$, borrowing weights $\alpha_{0et} = \alpha_{0ec} = 0.5$:

p_ext <- pbayespostpred1bin(
  prob = 'posterior', design = 'external', theta0 = 0.20,
  n_t = 12, n_c = 12, y_t = 8, y_c = 3,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  m_t = NULL, m_c = NULL, z = NULL,
  ne_t = 15L, ne_c = 15L, ye_t = 5L, ye_c = 4L,
  alpha0e_t = 0.5, alpha0e_c = 0.5,
  lower.tail = FALSE
)
cat(sprintf("P(theta > TV | data, external, alpha0e=0.5) = %.4f\n", p_ext))

Effect of the borrowing weight: varying $\alpha_{0ec}$ (control group only) with fixed $\alpha_{0et} = 0.5$:

# alpha0e must be in (0, 1]; avoid alpha0e = 0
ae_seq <- c(0.01, seq(0.1, 1.0, by = 0.1))
p_ae   <- sapply(ae_seq, function(ae) {
  pbayespostpred1bin(
    prob = 'posterior', design = 'external', theta0 = 0.20,
    n_t = 12, n_c = 12, y_t = 8, y_c = 3,
    a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
    m_t = NULL, m_c = NULL, z = NULL,
    ne_t = 15L, ne_c = 15L, ye_t = 5L, ye_c = 4L,
    alpha0e_t = 0.5, alpha0e_c = ae,
    lower.tail = FALSE
  )
})
data.frame(alpha0e_c = ae_seq, P_gt_TV = round(p_ae, 4))

4. Operating Characteristics

4.1 Definition

For given true response rates $(\pi_t, \pi_c)$, the operating characteristics are computed by exact enumeration over all $(n_t + 1)(n_c + 1)$ possible outcome pairs $(y_t, y_c) \in {0,\ldots,n_t} \times {0,\ldots,n_c}$:

$$\Pr(\mathrm{Go}) = \sum_{y_t=0}^{n_t} \sum_{y_c=0}^{n_c} \mathbf{1}!\left[g_{\mathrm{Go}}(y_t, y_c) \ge \gamma_{\mathrm{go}} \;\text{ and }\; g_{\mathrm{NoGo}}(y_t, y_c) < \gamma_{\mathrm{nogo}}\right]\, \binom{n_t}{y_t} \pi_t^{y_t}(1-\pi_t)^{n_t-y_t}\, \binom{n_c}{y_c} \pi_c^{y_c}(1-\pi_c)^{n_c-y_c},$$

where

$$g_{\mathrm{Go}}(y_t, y_c) = P(\pi_t - \pi_c > \theta_{\mathrm{TV}} \mid y_t, y_c),$$ $$g_{\mathrm{NoGo}}(y_t, y_c) = P(\pi_t - \pi_c \le \theta_{\mathrm{MAV}} \mid y_t, y_c),$$

and the decision outcomes are classified as:

4.2 Example: Controlled Design, Posterior Probability

oc_ctrl <- pbayesdecisionprob1bin(
  prob      = 'posterior',
  design    = 'controlled',
  theta_TV  = 0.30,  theta_MAV = 0.15,  theta_NULL = NULL,
  gamma_go  = 0.80,  gamma_nogo = 0.20,
  pi_t      = seq(0.10, 0.80, by = 0.05),
  pi_c      = rep(0.10, length(seq(0.10, 0.80, by = 0.05))),
  n_t = 12,  n_c = 12,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  z = NULL, m_t = NULL, m_c = NULL,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  error_if_Miss = TRUE, Gray_inc_Miss = FALSE
)
print(oc_ctrl)
plot(oc_ctrl, base_size = 20)

The same function supports design = 'uncontrolled' (with argument z), design = 'external' (with power prior arguments ne_t, ne_c, ye_t, ye_c, alpha0e_t, alpha0e_c), and prob = 'predictive' (with future sample size arguments m_t, m_c and theta_NULL). The function signature and output format are identical across all combinations.


5. Optimal Threshold Search

5.1 Objective and Algorithm

The getgamma1bin() function finds the optimal pair $(\gamma_{\mathrm{go}}^, \gamma_{\mathrm{nogo}}^)$ by grid search over $\Gamma = {\gamma^{(1)}, \ldots, \gamma^{(K)}} \subset (0, 1)$ at a null scenario $(\pi_t^{\mathrm{null}}, \pi_c^{\mathrm{null}})$.

A two-stage precompute-then-sweep strategy is used:

Stage 1 (Precompute): all posterior/predictive probabilities $g_{\mathrm{Go}}(y_t, y_c)$ and $g_{\mathrm{NoGo}}(y_t, y_c)$ are computed once over all $(n_t + 1)(n_c + 1)$ outcome pairs — independently of $\gamma$.

Stage 2 (Sweep): for each candidate $\gamma \in \Gamma$, $\Pr(\mathrm{Go} \mid \gamma)$ and $\Pr(\mathrm{NoGo} \mid \gamma)$ are obtained as weighted sums of indicator functions:

$$\Pr(\mathrm{Go} \mid \gamma) = \sum_{y_t, y_c} \mathbf{1}!\left[g_{\mathrm{Go}}(y_t, y_c) \ge \gamma\right]\, w(y_t, y_c),$$

$$\Pr(\mathrm{NoGo} \mid \gamma) = \sum_{y_t, y_c} \mathbf{1}!\left[g_{\mathrm{NoGo}}(y_t, y_c) \ge \gamma\right]\, w(y_t, y_c),$$

where $w(y_t, y_c) = \mathrm{Binom}(y_t; n_t, \pi_t)\,\mathrm{Binom}(y_c; n_c, \pi_c)$ are the binomial weights under the null scenario. No further calls to pbayespostpred1bin() are required at this stage.

The sweep is vectorised using outer(), making the full grid search computationally negligible once Stage 1 is complete.

The optimal $\gamma_{\mathrm{go}}^$ is the smallest value in $\Gamma$ such that $\Pr(\mathrm{Go} \mid \gamma_{\mathrm{go}}^)$ satisfies the criterion against a target (e.g., $\Pr(\mathrm{Go}) < 0.05$). Analogously for $\gamma_{\mathrm{nogo}}^*$.

5.2 Example: Controlled Design, Posterior Probability

res_gamma <- getgamma1bin(
  prob      = 'posterior',
  design    = 'controlled',
  theta_TV  = 0.30, theta_MAV = 0.15, theta_NULL = NULL,
  pi_t_go   = 0.10, pi_c_go   = 0.10,
  pi_t_nogo = 0.30, pi_c_nogo = 0.10,
  target_go = 0.05, target_nogo = 0.20,
  n_t = 12L, n_c = 12L,
  a_t = 0.5, b_t = 0.5, a_c = 0.5, b_c = 0.5,
  z = NULL, m_t = NULL, m_c = NULL,
  ne_t = NULL, ne_c = NULL, ye_t = NULL, ye_c = NULL,
  alpha0e_t = NULL, alpha0e_c = NULL,
  gamma_grid = seq(0.01, 0.99, by = 0.01)
)
plot(res_gamma, base_size = 20)

The same function supports design = 'uncontrolled' (with pi_t_go, pi_t_nogo, and z), design = 'external' (with power prior arguments), and prob = 'predictive' (with theta_NULL, m_t, and m_c). The calibration plot and the returned gamma_go/gamma_nogo values are available for all combinations.


6. Summary

This vignette covered single binary endpoint analysis in BayesianQDM:

  1. Bayesian model: Beta-Binomial conjugate posterior with explicit update formulae for $a_j^ = a_j + y_j$ and $b_j^ = b_j + n_j - y_j$; Jeffreys prior ($a_j = b_j = 0.5$) as the recommended weakly informative default.
  2. Posterior probability: evaluated via the convolution integral in pbetadiff() — no closed-form CDF exists for the Beta difference.
  3. Posterior predictive probability: evaluated by exact enumeration of the Beta-Binomial double sum in pbetabinomdiff().
  4. Three study designs: controlled (both groups observed), uncontrolled (hypothetical control count $z$ fixed from prior knowledge), external design (power prior with borrowing weight $\alpha_{0ej}$, augmenting the prior by $\alpha_{0ej}$ times the external sufficient statistics).
  5. Operating characteristics: computed by exact enumeration over all $(n_t+1)(n_c+1)$ outcome pairs with binomial probability weights — no simulation required for the binary endpoint.
  6. Optimal threshold search: two-stage precompute-then-sweep strategy in getgamma1bin() using outer() for a fully vectorised gamma sweep.

See vignette("single-continuous") for the analogous continuous endpoint analysis.



Try the BayesianQDM package in your browser

Any scripts or data that you put into this service are public.

BayesianQDM documentation built on April 22, 2026, 1:09 a.m.