knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(tibble)
library(dplyr)
library(knitr)
devtools::load_all()

Overview

The acronym NPES is short for non-proportional effect size. While it is motivated primarily by a use for when designing a time-to-event trial under non-proportional hazards (NPH), we have simplified and generalized the concept here. The model is likely to be useful for rank-based survival tests beyond the logrank test that will be considered initially by @Tsiatis. It could also be useful in other situations where treatment effect may vary over time in a trial for some reason. We generalize the framework of Chapter 2 of @PLWBook to incorporate the possibility of the treatment effect changing during the course of a trial in some systematic way. This vignettes addresses distribution theory and initial technical issues around computing

This is then applied to generalize computational algorithms provided in Chapter 19 of @JTBook that are used to compute boundary crossing probabilities as well as boundaries for group sequential designs. Additional specifics around boundary computation, power and sample size are provided in a separate vignette.

The probability model

The continuous model and E-process

We consider a simple example here to motivate distribution theory that is quite general and applies across many situations. For instance @PLWBook immediately suggest paired observations, time-to-event and binary outcomes as endpoints where the theory is applicable.

We assume for a given integer $N>0$ that $X_{i}$ are independent, $i=1,2,\ldots$. For some integer $K\le N$ we assume we will perform analysis $K$ times after $0<n_1<n_2,\ldots ,n_K = N$ observations are available for analysis. Note that we have not confined $n\le N$, but $N$ can be considered the final planned sample size. @PLWBook refer to the estimation or E-process which we extend here to

$$\hat{\theta}k = \frac{\sum{i=1}^{n_k} X_{i}}{n_k}\equiv \bar X_{k}.$$ While @PLWBook have used $\delta$ instead of $\theta$ in our notation, we stick more closely to the notation of @JTBook where $\theta$ is used. For our example, we see $\hat{\theta}_k\equiv\bar X_k$ represents the sample average at analysis $k$, $1\le k\le K$. With a survival endpoint, $\hat\theta_k$ would typically represent a Cox model coefficient representing the logarithm of the hazard ratio for experimental vs control treatment and $n_k$ would represent the planned number of events at analysis $k$, $1\le k\le K.$ Denoting $t_k=n_k/N$, we assume that for some real-valued function $\theta(t)$ for $t\ge 0$ we have for $1\le k\le K$

$$E(\hat{\theta}k) =\theta(t_k) =E(\bar X_k).$$ In the models of @PLWBook and @JTBook we would have $\theta(t)$ equal to some constant $\theta$. We assume further that for $i=1,2,\ldots$ $$\hbox{Var}(X{i})=1.$$ The sample average variance under this assumption is for $1\le k\le K$

$$\hbox{Var}(\hat\theta(t_k))=\hbox{Var}(\bar X_k) = 1/ n_k.$$ The statistical information for the estimate $\hat\theta(t_k)$ for $1\le k\le K$ for this case is $$ \mathcal{I}_k \equiv \frac{1}{\hbox{Var}(\hat\theta(t_k))} = n_k.$$ We now see that $t_k$, $1\le k\le K$ is the so-called information fraction at analysis $k$ in that $t_k=\mathcal{I}_k/\mathcal{I}_K.$

Z-process

The Z-process is commonly used (e.g., @JTBook) and will be used below to extend the computational algorithm in Chapter 19 of @JTBook by defining equivalently in the first and second lines below for $k=1,\ldots,K$

$$Z_{k} = \frac{\hat\theta_k}{\sqrt{\hbox{Var}(\hat\theta_k)}}= \sqrt{\mathcal{I}_k}\hat\theta_k= \sqrt{n_k}\bar X_k.$$

The variance for $1\le k\le K$ is $$\hbox{Var}(Z_k) = 1$$ and the expected value is

$$E(Z_{k})= \sqrt{\mathcal{I}k}\theta(t{k})= \sqrt{n_k}E(\bar X_k) .$$

B-process

B-values are mnemonic for Brownian motion. For $1\le k\le K$ we define $$B_{k}=\sqrt{t_k}Z_k$$ which implies $$ E(B_{k}) = \sqrt{t_{k}\mathcal{I}_k}\theta(t_k) = t_k \sqrt{\mathcal{I}_K} \theta(t_k) = \mathcal{I}_k\theta(t_k)/\sqrt{\mathcal{I}_K}$$ and $$\hbox{Var}(B_k) = t_k.$$

For our example, we have

$$B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i.$$ It can be useful to think of $B_k$ as a sum of independent random variables.

Summary of E-, Z- and B-processes

mt <- tibble::tribble(
  ~Statistic,    ~Example,   ~"Expected value", ~Variance,
  "$\\hat\\theta_k$", "$\\bar X_k$", "$\\theta(t_k)$",  "$\\mathcal{I}_k^{-1}$",
  "$Z_k=\\sqrt{\\mathcal{I}_k}\\hat\\theta_k$","$\\sqrt{n_k}\\bar X_k$", "$\\sqrt{\\mathcal{I}_k}\\theta(t_k)$",  "$1$",
  "$B_k=\\sqrt{t_k}Z_k$","$\\sum_{i=1}^{n_k}X_i/\\sqrt N$", "$t_k\\sqrt{\\mathcal{I}_K}\\theta(t_k)=\\mathcal{I}_k\\theta(t_k)/\\sqrt{\\mathcal{I}_K}$",  "$t_k$"
)
mt %>% kable(escape = FALSE)

Conditional independence, covariance and canonical form

We assume independent increments in the B-process. That is, for $1\le j < k\le K$ $$\tag{1} B_k - B_j \sim \hbox{Normal} (\sqrt{\mathcal{I}K}(t_k\theta(t_k)- t_j\theta(t_j)), t_k-t_j)$$ independent of $B_1,\ldots,B_j$. As noted above, for a given $1\le k\le K$ we have for our example $$B_j=\sum{i=1}^{n_j}X_i / \sqrt N.$$ Because of independence of the sequence $X_i$, $i=1,2,\ldots$, we immediately have for $1\le j\le k\le K$ $$\hbox{Cov}(B_j,B_k) = \hbox{Var}(B_j) = t_j.$$ This leads further to $$\hbox{Corr}(B_j,B_k)=\frac{t_j}{\sqrt{t_jt_k}}=\sqrt{t_j/t_k}=\hbox{Corr}(Z_j,Z_k)=\hbox{Cov}(Z_j,Z_k)$$ which is the covariance structure in the so-called canonical form of @JTBook. For our example, we have $$B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i$$ and $$B_k-B_j=\frac{1}{\sqrt N}\sum_{i=n_j + 1}^{n_k}X_i$$ and the covariance is obvious. We assume independent increments in the B-process that will be demonstrated for the simple example above. That is, for $1\le j < k\le K$ $$\tag{1} B_k - B_j \sim \hbox{Normal} (\mathcal{I}k\theta(t_k)- \mathcal{I}_j\theta(t_j), t_k-t_j)$$ independent of $B_1,\ldots,B_j$. For a given $1\le j\le k\le K$ we have for our example $$B_j=\sum{i=1}^{n_j}X_i / (\sqrt N\sigma).$$ Because of independence of the sequence $X_i$, $i=1,2,\ldots$, we immediately have for $1\le j\le k\le K$ $$\hbox{Cov}(B_j,B_k) = \hbox{Var}(B_j) = t_j/t_k =\mathcal{I}j/\mathcal{I}_k.$$ This leads to $$\mathcal{I}_j/\mathcal{I}_k=\sqrt{t_j/t_k}=\hbox{Corr}(B_j,B_k)=\hbox{Corr}(Z_j,Z_k)=\hbox{Cov}(Z_j,Z_k)$$ which is the covariance structure in the so-called canonical form of @JTBook. The independence of $B_j$ and $$B_k-B_j=\sum{i=n_j + 1}^{n_k}X_i/(\sqrt N\sigma)$$ is obvious for this example.

Test bounds and crossing probabilities

In this section we define notation for bounds and boundary crossing probabilities for a group sequential design. We also define an algorithm for computing bounds based on a targeted boundary crossing probability at each analysis. The notation will be used elsewhere for defining one- and two-sided group sequential hypothesis testing. A value of $\theta(t)>0$ will reflect a positive benefit.

For $k=1,2,\ldots,K-1$, interim cutoffs $-\infty \le a_k< b_k\le \infty$ are set; final cutoffs $-\infty \le a_K\leq b_K <\infty$ are also set. An infinite efficacy bound at an analysis means that bound cannot be crossed at that analysis. Thus, $3K$ parameters define a group sequential design: $a_k$, $b_k$, and $\mathcal{I}_k$, $k=1,2,\ldots,K$.

Notation for boundary crossing probabilities

We now apply the above distributional assumptions to compute boundary crossing probabilities. We use a shorthand notation in this section to have $\theta$ represent $\theta()$ and $\theta=0$ to represent $\theta(t)\equiv 0$ for all $t$. We denote the probability of crossing the upper boundary at analysis $k$ without previously crossing a bound by

$$\alpha_{k}(\theta)=P_{\theta}({Z_{k}\geq b_{k}}\cap_{j=1}^{i-1}{a_{j}\le Z_{j}< b_{j}}),$$ $k=1,2,\ldots,K.$

Next, we consider analogous notation for the lower bound. For $k=1,2,\ldots,K$ denote the probability of crossing a lower bound at analysis $k$ without previously crossing any bound by

$$\beta_{k}(\theta)=P_{\theta}((Z_{k}< a_{k}}\cap_{j=1}^{k-1}{ a_{j}\le Z_{j}< b_{j}}).$$ For symmetric testing for analysis $k$ we would have $a_k= - b_k$, $\beta_k(0)=\alpha_k(0),$ $k=1,2,\ldots,K$. The total lower boundary crossing probability for a trial is denoted by $$\beta(\theta)\equiv\sum_{k=1}^{K}\beta_{k}(\theta).$$ Note that we can also set $a_k= -\infty$ for any or all analyses if a lower bound is not desired, $k=1,2,\ldots,K$. For $k-\infty$ or $b_k<\infty$.

References



keaven/gsDesign2 documentation built on Oct. 13, 2022, 8:42 p.m.