Analysis Support, Critical Values, Power, Time to Signal and Sample Size for Sequential Analysis with Poisson and Binomial Data.

Description

Sequential is designed for continuous and group sequential analysis, where statistical hypothesis testing is conducted repeatedly on accumulating data that gradually increases the sample size. This is different from standard statistical analysis, where a single analysis is performed using a fixed sample size. It is possible to analyze either Poisson type data or binomial 0/1 type data. For binomial data, it is possible to incorporate an off-set term to account for variable matching ratios. For Poisson data, the critical value is based on a Wald-type upper boundary, which is flat on the scale of the log-likelihood ratio, and on a predetermined maximum sample size. For data distributions, it is also possible to apply a user defined alpha spending function. For group sequential analyses, there are functions for pre-specified group sizes and for the situation when the group sizes are not known a priori. It is also possible to perform mixed continuous/group sequential analysis, where, for example, there is at first a big batch of data that arrives in one group, followed by continuous sequential analysis. All results are exact, based on iterative numerical calculations, rather than asymptotic theory or computer simulations.

In the package, there are functions to calculate critical values, statistical power, expected time to signal when the null hypothesis is rejected, and expected sample size at the end of the sequential analyses whether the null hypothesis was rejected or not. For example, for any desired power, relative risk and alpha level, the package can calculate the required upper limit on the sample size, the critical value needed, and the corresponding expected time to signal when the null hypothesis is rejected.

Details

Package: Sequential
Type: Package
Version: 2.2.1
Date: 2016-09-09
License: GPL 2
LazyLoad: yes
Index:
Analyze.Binomial Function to Conduct Group Sequential Analyses for Binomial
Data When the Goup Sizes are not Known a Priori.
AnalyzeSetUp.Binomial Function to Set Up the Input Parameters Before Using the
Analyze.Binomial Function for the First Time.
Analyze.Poisson Function to Conduct Group Sequential Analyses for Poisson
Data When the Goup Sizes are not Known a Priori.
AnalyzeSetUp.Poisson Function to Set Up the Input Parameters Before Using the
Analyze.Poisson Function for the First Time.
CV.Binomial Critical Values for Continuous Sequential Analysis with
Binomial Data.
CV.G.Binomial Critical Values for Group Sequential Analysis with Binomial Data.
CV.G.Poisson Critical Values for Group Sequential Analysis with Poisson Data.
CV.Poisson Critical Values for Continuous Sequential Analysis with
Poisson Data.
CV.CondPoisson Critical Values for continuous sequential CMaxSPRT for
Poisson data with limited information from historical cohort.
Performance.Binomial Power, Expected Signal Time and Sample Size for Continuous Sequen-
tial Analysis with Binomial Data.
Performance.G.Binomial Power, Expected Signal Time and Sample Size for Group Sequential
Analysis with Binomial Data.
Performance.G.Poisson Power, Expected Signal Time and Sample Size for Group Sequential
Analysis with Poisson Data.
Performance.Poisson Power, Expected Signal Time and Sample Size for Continuous Sequen-
tial Analysis from Limited Historical Cohort Poisson Data.
Performance.CondPoisson Power, Expected Signal Time and Sample Size for Continuous
Sequential CMaxSPRT with Poisson Data.
SampleSize.Binomial Sample Size Calculation for Continuous Sequential Analysis with
Binomial Data.
SampleSize.Poisson Sample Size Calculation for Continuous Sequential Testing with
Poisson Data.
SampleSize.CondPoisson Sample Size Calculation for Continuous Sequential CMaxSPRT with
Poisson Data.

Overview

Most of the sequential analysis methods found in the literature are based on asymptotic results. In contrast, this package contains functions for the exact calculation of critical values, statistical power, expected time to signal when the null is rejected and the maximum sample size needed when the null is not rejected. This is done for Poisson and binomial type data with a Wald-type upper boundary, which is flat with respect to the likelihood ratio function, and a predetermined upper limit on the sample size. For a desired statistical power, it is also possible to calculate the latter. The motivation for this package is post-market near real-time drug and vaccine safety surveillance, where the goal is to detect rare but serious safety problems as early as possible, in many cases after only a hand full of adverse events. The package can also be used in other application areas, such as clinical trials.

The basis for this package is the Maximized Sequential Probability Ratio Test (MaxSPRT) statistic (Kulldorff et al., 2011), which is a variant of Wald's Sequential Probability Ratio Test (SPRT) (Wald, 1945,47). MaxSPRT uses a composite alternative hypothesis, and upper boundary to reject the null hypothesis when there are more events than expected, no lower boundary, and an upper limit on the sample size at which time the sequential analyses end without rejecting the null. MaxSPRT was developed for post-market vaccine safety surveillance as part of the Vaccine Safety Datalink project run by the Centers for Disease Control and Prevention.

In this package, all critical values, alpha spending strategies, statistical power, expected time to signal and required sample size to achieve a certain power, are obtained exactly to whatever decimal precision desired, using iterative numerical calculations. None of the results are based on asymptotic theory or computer simulations.

Poisson Data

To start, consider continuous sequential analysis for Poisson data. Let C_t be the random variable that counts the number of events up to time t. Suppose that, under the null hypothesis, C_t has a Poisson distribution with mean μ_t, where μ_t is a known function reflecting the population at risk. Under the alternative hypothesis, suppose that C_t has a Poisson distribution with mean RRμ_t, where "RR" is the unknown increased relative risk due to the vaccine. The MaxSPRT statistic defined in terms of the log likelihood ratio is given by:

LLR_t=(μ_t-c_t)+c_t \log{c_t/μ_t},

when c_t is at least μ_t, and LLR_t =0, otherwise. For continuous sequential analysis, the test statistic, LLR_t, is monitored at all times t \in (0,T], where T= SampleSize. SampleSize is defined a priori by the user in order to achieve the desired statistical power, which can be calculated using the SampleSize.Poisson function. The sequential analysis ends, and H_0 is rejected if, and when, LLR_t ≥q CV, where CV is calculated using the CV.Poisson function. If μ_t= SampleSize, the sequential analysis ends without rejecting the null hypothesis. To calculate other important performance metrics, such as the expected time to signal when the null hypothesis is rejected, use the Performance.Poisson function.

If the first event occurs sufficiently early, the sequential analysis may end with the null hypothesis rejected after a single events. There is an option to require a minimum number of observed events, c_t= M, before the null can be rejected. Setting M in the range [3,6] is often a good choice (Kulldorff and Silva, 2012). If there is a delay until the sequential analysis starts, but it continuous continuously thereafter, there is an option for that as well, requiring a minimum number μ_t= D of expected events before the null can be rejected.

With continuous sequential analysis, investigators can repeatedly analyze the data as often as they want, ensuring that the overall probability of falsely rejecting the null hypothesis at any time during the analysis is controlled at the desired nominal significance level (Wald, 1945, 1947). Continuous sequential methods are suitable for real-time or near real-time monitoring. When data is only analyzed intermittently, group sequential methods are used instead (Chin, 2012; Cook and DeMets, 2007; Xia, 2007; Friedman et al., 2010; Ghosh and Sen, 1991; Jennison and Turnbull, 2000; Mukhopadhyay and Silva, 2002; Whitehead, 1997). The data is then analyzed at regular or irregular discrete time intervals after a certain amount of data is accessible. Group sequential statistical methods are commonly used in clinical trials, where a trial may be stopped early due to either efficacy or unexpected adverse events (Jennison and Turnbull, 2000).

The same test statistic, LLR_t, is used for group sequential analyses (Silva and Kulldorff, 2012). The times when LLR_t is evaluated can be defined in several ways, using regular or irregular time intervals that are referenced by calendar period, sample size or some scale involving the distribution of the data. For Poisson data, the group sequential analysis must be conducted with equal size groups, with a constant expected number of adverse events between looks at the accumulating data. In another words, LLR_t is compared against CV whenever μ_t is a multiple of SampleSize/Looks, where 'Looks' is the total number of looks at the data. To do group sequential analysis for Poisson data, use the CV.G.Poisson and Performance.G.Poisson functions.

Binomial Data

The MaxSPRT method can also be applied to binomial/Bernoulli data. Let n denote the total number of events that has been observed in a sequential monitoring up to a certain moment in time. Suppose that these n events are categorized as cases and controls. For example, cases may be adverse events happening to a person taking drug A, while controls may be the same adverse event happening to someone in a matched set of individuals taking drug B. As another example, in a self-control sequential analysis, cases may be adverse events happening during the 1-28 days following vaccination, while controls are the same adverse events happening 29-56 days after vaccination.

Let C_t to denote the number of cases among the n events, and assume that C_t follows a binomial distribution with success probability equal to p, where p = 1=(1 + z), and z is the matching ratio between the occurrence of a case and of a control under the null hypothesis. For example, if the probability of having a case (instead of a control) is p = 1=(1 + z) = 0.5, then z=1 (1:1 matching ratio), or, p = 0.25 for z=3 (1:3 matching ratio), etc.

The MaxSPRT statistic (Kulldorff et al., 2011) for a continuous binomial surveillance is:

LR_n=\frac{(c_n/n)^{c_n}≤ft[(n-c_n)/n\right]^{n-c_n}}{≤ft[1/(1+z)\right]^{c_n}≤ft[z/(1+z)\right]^{n-c_n}},

if z c_n/(n-c_n)>1, and LR_n= 1 otherwise.

The monitoring is continued until either there is a signal rejecting the null hypothesis (LR_n > CV) or until n=N, which indicates that the null is not to be rejected. To perform the calculations, use the CV.Binomial, SampleSize.Binomial and Performance.Binomial functions.

To calculate the critical value for a Wald type rejection boundary, and when the group sizes are fixed a priori, use the CV.G.Binomial function. For statistical power, expected time to signal and maximum sample size requirements, use the Performance.G.Binomial function.

The main assumptions behind the method above are: (i) the monitoring is truly performed in a continuous fashion; (ii) the matching ratio (z) is constant for all of the n events, and (iii) it uses a Wald type rejection boundary that is flat in terms of the likelihood function. Relaxing these assumptions, Fireman et al. (2013) developed exact sequential analysis for group sequential data with varying matching ratios, and for any user specified alpha rejection plan.

Alpha spending function for unpredictable group sizes

The alpha spending function specifies the cumulative amount, F_{α}(t), of Type I error probability related to each of the possible values of n. Thus, at the end of the monitoring the alpha spending corresponds to a value smaller than or equal to the overall amount of Type I error probability defined for the overall nominal significance level, α.

Denote the single probability of rejecting the null hypothesis at the j-th test by α_j. Then, the alpha spending at test i is given by F_{α}(t_i)=∑_{j=1}^{i}α_j ≤q α.

There is a vast number of proposals for choosing the shape of the alpha spending function. Jennison and Turnbull (2000) present a rich discussion about this topic. They dedicated a special attention to the alpha spending of the form: F_{α}(t)=α t^{ρ}, where ρ>1, and t represents a fraction of the maximum length of surveillance.

To run continuous or group sequential analysis with a user defined alpha spending function, and/or, when the group sizes are not known a prior, Analyze.Binomial and Analyze.Poisson should be used for binomial and Poisson data, respectively. These functions work differently than the other functions mentioned above. Those other functions are designed to be used before the start of the sequential analysis, in order to determine what the maximum sample size and critical value should be. Once the sequential analysis is under way, the test statistic is then calculated using a hand calculator or an excel spread sheet, and compared with the critical value. The functions Analyze.Binomial and Analyze.Poisson work very differently, in that they are run at each look at the accumulating data, whenever a new group of data arrives, and it is meant to perform the test itself, i.e., there is no need to use hand calculators or excel spread sheets or any other auxiliar code. The results and conclusions, including a descriptive table and illustrative graphics, are automatically provided after running Analyze.Binomial (or Analyze.Poisson).

Important: before using these functions, though, it is necessary to first run the functions AnalyzeSetup.Binomial (or AnalyzeSetup.Poisson) once in order to set everything up for the sequential analysis.

CMaxSPRT for Poisson data with limited information from historical cohort

In Poisson MaxSPRT, the expected mean μ_t is assumed to be a known function reflecting the baseline adverse event risk in the absence of the exposure of interest. In practice, it is estimated with historical data and the uncertainty associated with the estimated counts may or may not have a non-negligible impact on the performance of the sequential analysis method. Li and Kulldorff (Li and Kulldorff, 2010) showed in their simulation study that uncertainty in the estimated baseline means can be ignored when the total number of events in the historical data is at least 5 times the specified upper limit T. Otherwise, it is recommended to implement the Conditional Maximized Sequential Probabilit Ratio Test (CMaxSPRT) to account for variation in both the historical and surveillance cohorts.

Let c and V denote the total number of events and the cumulative person-time in the historical data, let P_k denote the cumulative person-time observed in the surveillance population when the kth event occurred. The CMaxSPRT statistic defined in terms of the log likelihood ratio is given by

U_k=clog(\frac{c(1+P_k/V)}{c+k})+klog(\frac{k(1+P_k/V)}{P_k/V(c+k)}),

when k/c>P_k/V, and U_k=0, otherwise. In the original publication (Li and Kulldorff, 2010), the method was introduced as a continuous sequential analytic approach with the upper limit defined in terms of the maximum number of observed events, i.e., k ≤q K, and the critical value calculated via a Monte Carlo approach. A large number of Monte Carlo simulations (e.g., 10 million) might be needed to calculate the critical values with a reasonable precision. In Silva et al. (2016), the method was extended i) with another option of defining the surveillance length in terms of the maximum cumulative person-time divided by the total cumulative person-time in the historical cohort, i.e., P_k/V ≤ T, ii) with an exact calculation of the critical values for both surveillance length definitions, and iii) for group sequential analysis with data updated and analyzed intermittently instead of continuously. The exact critical values are calculated using the interval havling method to solve for the root of a complex, non-linear equation such that the overall Type I error rate is preserved at the nominal level. As K increases, the computing time for the exact critical values increases exponentially. Silva et al. (2016) also proposed two approximation methods to calculate the critical values that require substantially less computing time. One approch may overestimate the critical values and thus is referred to as the conservative approach as it may yield lower-than-nominal Type I error rates; the other approach may underestmate the critical values and thus is referred to as the liberal approach as it may yield higher-than-nominal Type I error rates. The recommendation is to use the exact approach when K is small (e.g., 10), use the conservative approach when K is medium or large but c is small, and use the liberal approach when c is medium (e.g., 50) or large. Simulation results show that the three approaches yield very similar results when K and c are reasonably large.

Comparison with Other R Packages for Sequential Analysis

The R Sequential package is designed for sequential analysis where statistical hypothesis testing is performed using gradually accumulating data. It is not designed for quality control problems, where a process is monitored over time to detect an emerging problem due to a sudden increase in the excess risk. Although the methods for sequential analysis and quality control may seem similar, as they both analyze gradually accumulating data, they are actually very different in both their purpose and design. Under the sequential hypothesis testing approach, the objective is to quickly determine if there is some intrinsic excess risk, with the assumption that this risk does not change over time. For example, we may want to know if drug A is better than drug B, and there is no reason to believe that the behavior of the drugs change over time. In the quality control setting, the objective is instead to detect a possible change in a stochastic process that may occur in the future, and to detect that change as soon as possible after it occurs. For example, the heart of a hospital patient is beating as it should, but if there is a sudden deterioration, the alarm should sound as soon as possible without generating a lot of false alarms. This package is only meant for sequential analysis of the former type, and it should not be used for quality control type problems. For quality control type analyses, there are other R packages available, such as graphicsQC, IQCC, MetaQC, MSQC, qcc, and qcr.

In a number of ways, the R Sequential package differs from other R packages for sequential analyses. Historically, most sequential analysis has been conducted using asymptotic statistical theory, and that is also what is used in the gsDesign, ldbounds, PwrGSD, seqDesign, seqmon, and sglr R packages. In contrast, the R Sequential package is based on exact results, using iterative numerical calculations, rather than using asymptotic theory or computer simulations.

With this package, it is only possible to analyze Poisson or binomial/Bernoulli data. For other probability distributions, such as normal or exponential data, other R packages should be consulted, such as GroupSeq or SPRT. Moreover, all functions in this package uses a one-sided upper bound to reject the null hypothesis, while the analyses end without rejecting the null when an upper limit on the sample size is reached. For two sided sequential analysis, or other types of rejection boundaries, other R packages must be used, such as e.g. ldbounds and Binseqtest. Finally, in this package, there are functions for both continuous and group sequential analysis, and it is also possible to analyze situations where some of the data arrives continuously while other parts of the data arrives in groups. Most other R packages are exclusively designed for group sequential analysis, but there are some that also do continuous sequential analysis, such as Binseqtest and SPRT, but Binseqtest is only for binomial data type, and SPRT is for simple alternative hypothis, while Sequential can be used for binomial and Poisson data and is meant to composite alternative hypothesis. The present package offers the possibility to calculate the expected time to signal through the Performance.Poisson, Performance.G.Poisson, Performance.G.Binomial, and Performance.Binomial functions, which is not offered by the other packages cited above.

Acknowledgements

Development of the R Sequential package has been funded and supported by:
- Food and Drug Administration, USA, through the Mini-Sentinel Project (v1.0,1.1,2.0).
- National Institute of General Medical Sciences, NIH, USA, through grant number R01GM108999 (v2.0).
- Federal University of Ouro Preto (UFOP), through contract under internal UFOP's resolution CEPE 4600 (v2.0).
- National Council of Scientific and Technological Development (CNPq), Brazil (v1.0).
- Bank for Development of the Minas Gerais State (BDMG), Brazil (v1.0).

Feedback from users is greatly appreciated. Very valuable suggestions concerning the R Sequential package have been received from various individuals, including:
- Ron Berman, University of California Berkeley.
- Claudia Coronel-Moreno, Harvard Pilgrim Health Care Institute.
- Bruce Fireman, Kaiser Permanente Northern California.
- Josh Gagne, Harvard Medical School and Brigham and Women's Hospital.
- Ned Lewis, Kaiser Permanente Northern California.
- Judith Maro, Harvard Medical School and Harvard Pilgrim Health Care Institute.
- Azadeh Shoaibi, Food and Drug Administration.
- Katherine Yih, Harvard Medical School and Harvard Pilgrim Health Care Institute.
- Jie Tang, Clinical biostatistics, Janssen R and D US, Johnson and Johnson LLC.
- Tuomo A. Nieminen, The National Institute for Health and Welfare (THL), Finland.
- Andreia Leite, Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine.

Version History of the R Sequential Package

Version 1.1, February 2013
Exact sequential analysis for Poisson data:
- Exact continuous sequential analysis.
- Exact group sequential analysis with pre-defined and constant groups sizes.
- Wald type rejection boundary.
- Statistical power, expected time to signal and sample size calculations.
- User guide.

Version 1.2, January 2014
- Improved code structure and efficiency.
- More extensive user guide.

Version 2.0, June 2015
Exact sequential analysis for binomial data:
- Continuous sequential analysis.
- Group sequential analysis with pre-defined group sizes.
- Group sequential analysis with unpredictable group sizes, not specified a priori.
- Fixed or variable binomial probabilities (matching ratios).
- User specified alpha spending function.
- Statistical power, expected time to signal and sample size calculations.
- Updated user guide.

Version 2.0.1, June 2015
- Correction of bugs in CV.Poisson function.
- Updated user guide.

Version 2.0.2, Octuber 2015
- Improved user guide.

Version 2.1, May 2016
Exact sequential analysis for Poisson data:
- Group sequential analysis with unpredictable group sizes, not specified a priori.
- User specified alpha spending function.
- Mixed group-continuous sequential analysis.
- Statistical power, expected time to signal and sample size calculations for non-constant groups sizes.
Other:
- Directory address parameter in AnalyzeSetUp functions.
- Probability parameter in binomial functions.
- Updated user guide.

Version 2.1.1, June 2016
- Correction of bugs in Poisson functions.
- Updated user guide.

Version 2.2, July 2016
- Critical Value, Performance, and SampleSize calculations for CMaxSPRT with Poisson data.
- Updated user guide.

Version 2.2.1, September 2016
- Correction of bugs in CV.Poisson and CV.G.Poisson functions.
- Updated user guide.

Author(s)

Ivair Ramos Silva, Martin Kulldorff.
Maintainer: Ivair Ramos Silva <jamesivair@yahoo.com.br>

References

Chin R. (2012), Adaptive and Flexible Clinical Trials, Boca Raton, FL: Chapman and Hall/CRC.

Cook TD, DeMets DL. (2007), Introduction to Statistical Methods for Clinical Trials: Chapman and Hall/CRC Texts in Statistical Science.

Fireman B, et al. (2013) Exact sequential analysis for binomial data with timevarying probabilities. Manuscript in Preparation.

Friedman LM, Furberg CD, DeMets D. (2010), Fundamentals of Clinical Trials, 4th ed.: Springer.

Ghosh BK, Sen PK. (1991), Handbook of Sequential Analysis, New York: MARCEL DEKKER, Inc.

Ghosh M, Mukhopadhyay N, Sen PK. (2011), Sequential Estimation: Wiley.

Jennison C, Turnbull B. (2000), Group Sequential Methods with Applications to Clinical Trials, London: Chapman and Hall/CRC.

Kulldorff M, Davis RL, Kolczak M, Lewis E, Lieu T, Platt R. (2011). A Maximized Sequential Probability Ratio Test for Drug and Safety Surveillance. Sequential Analysis, 30: 58–78.

Kulldorff M, Silva IR. (2015). Continuous Post-market Sequential Safety Surveillance with Minimum Events to Signal. arxiv:1503.01978 [stat.ap].

Mukhopadhyay N, Silva BM. (2002), Sequential Methods and Their Applications, 1th ed.: Chapman and Hall/CRC.

Silva IR, Kulldorff M. (2015), Continuous versus Group Sequential Analysis for Vaccine and Drug Safety Surveillance. Biometrics, 71 (3), 851–858.

Xia Qi. (2007), A Procedure for Group Sequential Comparative Poisson Trials. Journal of Biopharmaceutical Statistics, 17, 869–881.

Wald A. (1945), Sequential Tests of Statistical Hypotheses, Annals of Mathematical Statistics, 16, 117–186.

Wald A. (1947), Sequential Analysis. New York: John Wiley and Sons.

Whitehead J. (1997), The Design and Analysis of Sequential Clinical Trials, 2th ed.: Wiley.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Critical value for continuous sequential analyses for Poisson Data.
## Maximum sample size = 10, alpha = 0.05 and minimum number of events = 3:

cvt<- CV.Poisson(SampleSize=10,D=0,M=3,alpha=0.05)

## Statistical power and the expected time to signal for relative risk RR=2:

result<- Performance.Poisson(SampleSize=10,D=0,M=3,cv=cvt,RR=2)

# And if you type:
result

# Then you will see the following:
#          Power ESignalTime ESampleSize
#     [1,] 0.7329625    4.071636    5.654732

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.