Analysis Support, Critical Values, Power, Time to Signal and Sample Size for Sequential Analysis with Poisson and Binomial Data.
Description
Sequential
is designed for continuous and group sequential analysis, where statistical hypothesis testing is conducted repeatedly
on accumulating data that gradually increases the sample size. This is different from standard statistical analysis, where a single analysis is performed
using a fixed sample size. It is possible to analyze either Poisson type data or binomial 0/1 type data. For binomial data, it is possible to incorporate an
offset term to account for variable matching ratios. For Poisson data, the critical value is based on a Waldtype upper boundary, which is flat on the scale
of the loglikelihood ratio, and on a predetermined maximum sample size. For data distributions, it is also possible to apply a user defined alpha spending function.
For group sequential analyses, there are functions for prespecified group sizes and for the situation when the group sizes are not known a priori.
It is also possible to perform mixed continuous/group sequential analysis, where, for example, there is at first a big batch of data that arrives in one group,
followed by continuous sequential analysis. All results are exact, based on iterative numerical calculations, rather than asymptotic theory or computer simulations.
In the package, there are functions to calculate critical values, statistical power, expected time to signal when the null hypothesis is rejected, and expected sample size at the end of the sequential analyses whether the null hypothesis was rejected or not. For example, for any desired power, relative risk and alpha level, the package can calculate the required upper limit on the sample size, the critical value needed, and the corresponding expected time to signal when the null hypothesis is rejected.
Details
Package:  Sequential 
Type:  Package 
Version:  2.2.1 
Date:  20160909 
License:  GPL 2 
LazyLoad:  yes 
Index:  
Analyze.Binomial  Function to Conduct Group Sequential Analyses for Binomial 
Data When the Goup Sizes are not Known a Priori.  
AnalyzeSetUp.Binomial  Function to Set Up the Input Parameters Before Using the 
Analyze.Binomial Function for the First Time. 

Analyze.Poisson  Function to Conduct Group Sequential Analyses for Poisson 
Data When the Goup Sizes are not Known a Priori.  
AnalyzeSetUp.Poisson  Function to Set Up the Input Parameters Before Using the 
Analyze.Poisson Function for the First Time. 

CV.Binomial  Critical Values for Continuous Sequential Analysis with 
Binomial Data.  
CV.G.Binomial  Critical Values for Group Sequential Analysis with Binomial Data. 
CV.G.Poisson  Critical Values for Group Sequential Analysis with Poisson Data. 
CV.Poisson  Critical Values for Continuous Sequential Analysis with 
Poisson Data.  
CV.CondPoisson  Critical Values for continuous sequential CMaxSPRT for 
Poisson data with limited information from historical cohort.  
Performance.Binomial  Power, Expected Signal Time and Sample Size for Continuous Sequen 
tial Analysis with Binomial Data.  
Performance.G.Binomial  Power, Expected Signal Time and Sample Size for Group Sequential 
Analysis with Binomial Data.  
Performance.G.Poisson  Power, Expected Signal Time and Sample Size for Group Sequential 
Analysis with Poisson Data.  
Performance.Poisson  Power, Expected Signal Time and Sample Size for Continuous Sequen 
tial Analysis from Limited Historical Cohort Poisson Data.  
Performance.CondPoisson  Power, Expected Signal Time and Sample Size for Continuous 
Sequential CMaxSPRT with Poisson Data.  
SampleSize.Binomial  Sample Size Calculation for Continuous Sequential Analysis with 
Binomial Data.  
SampleSize.Poisson  Sample Size Calculation for Continuous Sequential Testing with 
Poisson Data.  
SampleSize.CondPoisson  Sample Size Calculation for Continuous Sequential CMaxSPRT with 
Poisson Data.  
Overview
Most of the sequential analysis methods found in the literature are based on asymptotic results. In contrast, this package contains functions for the exact calculation of critical values, statistical power, expected time to signal when the null is rejected and the maximum sample size needed when the null is not rejected. This is done for Poisson and binomial type data with a Waldtype upper boundary, which is flat with respect to the likelihood ratio function, and a predetermined upper limit on the sample size. For a desired statistical power, it is also possible to calculate the latter. The motivation for this package is postmarket near realtime drug and vaccine safety surveillance, where the goal is to detect rare but serious safety problems as early as possible, in many cases after only a hand full of adverse events. The package can also be used in other application areas, such as clinical trials.
The basis for this package is the Maximized Sequential Probability Ratio Test (MaxSPRT) statistic (Kulldorff et al., 2011), which is a variant of Wald's Sequential Probability Ratio Test (SPRT) (Wald, 1945,47). MaxSPRT uses a composite alternative hypothesis, and upper boundary to reject the null hypothesis when there are more events than expected, no lower boundary, and an upper limit on the sample size at which time the sequential analyses end without rejecting the null. MaxSPRT was developed for postmarket vaccine safety surveillance as part of the Vaccine Safety Datalink project run by the Centers for Disease Control and Prevention.
In this package, all critical values, alpha spending strategies, statistical power, expected time to signal and required sample size to achieve a certain power, are obtained exactly to whatever decimal precision desired, using iterative numerical calculations. None of the results are based on asymptotic theory or computer simulations.
Poisson Data
To start, consider continuous sequential analysis for Poisson data. Let C_t be the random variable that counts the number of events up to time t. Suppose that, under the null hypothesis, C_t has a Poisson distribution with mean μ_t, where μ_t is a known function reflecting the population at risk. Under the alternative hypothesis, suppose that C_t has a Poisson distribution with mean RRμ_t, where "RR" is the unknown increased relative risk due to the vaccine. The MaxSPRT statistic defined in terms of the log likelihood ratio is given by:
LLR_t=(μ_tc_t)+c_t \log{c_t/μ_t},
when c_t is at least μ_t, and LLR_t =0, otherwise.
For continuous sequential analysis, the test statistic, LLR_t, is monitored at all times t \in (0,T], where T= SampleSize. SampleSize is defined
a priori by the user in order to achieve the desired statistical power, which can be calculated using the SampleSize.Poisson
function.
The sequential analysis ends, and H_0 is rejected if, and when, LLR_t ≥q CV, where CV is calculated using the CV.Poisson
function.
If μ_t= SampleSize, the sequential analysis ends without rejecting the null hypothesis. To calculate other important performance metrics, such as the expected time to signal when
the null hypothesis is rejected, use the Performance.Poisson
function.
If the first event occurs sufficiently early, the sequential analysis may end with the null hypothesis rejected after a single events. There is an option to require a minimum number of observed events, c_t= M, before the null can be rejected. Setting M in the range [3,6] is often a good choice (Kulldorff and Silva, 2012). If there is a delay until the sequential analysis starts, but it continuous continuously thereafter, there is an option for that as well, requiring a minimum number μ_t= D of expected events before the null can be rejected.
With continuous sequential analysis, investigators can repeatedly analyze the data as often as they want, ensuring that the overall probability of falsely rejecting the null hypothesis at any time during the analysis is controlled at the desired nominal significance level (Wald, 1945, 1947). Continuous sequential methods are suitable for realtime or near realtime monitoring. When data is only analyzed intermittently, group sequential methods are used instead (Chin, 2012; Cook and DeMets, 2007; Xia, 2007; Friedman et al., 2010; Ghosh and Sen, 1991; Jennison and Turnbull, 2000; Mukhopadhyay and Silva, 2002; Whitehead, 1997). The data is then analyzed at regular or irregular discrete time intervals after a certain amount of data is accessible. Group sequential statistical methods are commonly used in clinical trials, where a trial may be stopped early due to either efficacy or unexpected adverse events (Jennison and Turnbull, 2000).
The same test statistic, LLR_t, is used for group sequential analyses (Silva and Kulldorff, 2012). The times when LLR_t is evaluated can be defined in several ways,
using regular or irregular time intervals that are referenced by calendar period, sample size or some scale involving the distribution of the data. For Poisson data,
the group sequential analysis must be conducted with equal size groups, with a constant expected number of adverse events between looks at the accumulating data.
In another words, LLR_t is compared against CV whenever μ_t is a multiple of SampleSize/Looks, where 'Looks' is the total number of looks at the data. To do group sequential
analysis for Poisson data, use the CV.G.Poisson
and Performance.G.Poisson
functions.
Binomial Data
The MaxSPRT method can also be applied to binomial/Bernoulli data. Let n denote the total number of events that has been observed in a sequential monitoring up to a certain moment in time. Suppose that these n events are categorized as cases and controls. For example, cases may be adverse events happening to a person taking drug A, while controls may be the same adverse event happening to someone in a matched set of individuals taking drug B. As another example, in a selfcontrol sequential analysis, cases may be adverse events happening during the 128 days following vaccination, while controls are the same adverse events happening 2956 days after vaccination.
Let C_t to denote the number of cases among the n events, and assume that C_t follows a binomial distribution with success probability equal to p, where p = 1=(1 + z), and z is the matching ratio between the occurrence of a case and of a control under the null hypothesis. For example, if the probability of having a case (instead of a control) is p = 1=(1 + z) = 0.5, then z=1 (1:1 matching ratio), or, p = 0.25 for z=3 (1:3 matching ratio), etc.
The MaxSPRT statistic (Kulldorff et al., 2011) for a continuous binomial surveillance is:
LR_n=\frac{(c_n/n)^{c_n}≤ft[(nc_n)/n\right]^{nc_n}}{≤ft[1/(1+z)\right]^{c_n}≤ft[z/(1+z)\right]^{nc_n}},
if z c_n/(nc_n)>1, and LR_n= 1 otherwise.
The monitoring is continued until either there is a signal rejecting the null hypothesis (LR_n > CV)
or until n=N, which indicates that the null is not to be rejected. To perform the calculations, use the CV.Binomial
, SampleSize.Binomial
and Performance.Binomial
functions.
To calculate the critical value for a Wald type rejection boundary, and when the group sizes are fixed a priori, use the CV.G.Binomial
function. For statistical power,
expected time to signal and maximum sample size requirements, use the Performance.G.Binomial
function.
The main assumptions behind the method above are: (i) the monitoring is truly performed in a continuous fashion; (ii) the matching ratio (z) is constant for all of the n events, and (iii) it uses a Wald type rejection boundary that is flat in terms of the likelihood function. Relaxing these assumptions, Fireman et al. (2013) developed exact sequential analysis for group sequential data with varying matching ratios, and for any user specified alpha rejection plan.
Alpha spending function for unpredictable group sizes
The alpha spending function specifies the cumulative amount, F_{α}(t), of Type I error probability related to each of the possible values of n. Thus, at the end of the monitoring the alpha spending corresponds to a value smaller than or equal to the overall amount of Type I error probability defined for the overall nominal significance level, α.
Denote the single probability of rejecting the null hypothesis at the jth test by α_j. Then, the alpha spending at test i is given by F_{α}(t_i)=∑_{j=1}^{i}α_j ≤q α.
There is a vast number of proposals for choosing the shape of the alpha spending function. Jennison and Turnbull (2000) present a rich discussion about this topic. They dedicated a special attention to the alpha spending of the form: F_{α}(t)=α t^{ρ}, where ρ>1, and t represents a fraction of the maximum length of surveillance.
To run continuous or group sequential analysis with a user defined alpha spending function, and/or, when the group sizes are not known a prior,
Analyze.Binomial
and Analyze.Poisson
should be used for binomial and Poisson data, respectively.
These functions work differently than the other functions mentioned above.
Those other functions are designed to be used before the start of the sequential analysis, in order to determine what the maximum sample size
and critical value should be. Once the sequential analysis is under way, the test statistic is then calculated using a hand calculator or an
excel spread sheet, and compared with the critical value. The functions Analyze.Binomial
and Analyze.Poisson
work very differently, in that they are run at each look at
the accumulating data, whenever a new group of data arrives, and it is meant to perform the test itself, i.e., there is no need to use hand calculators or
excel spread sheets or any other auxiliar code. The results and conclusions, including a descriptive table and illustrative graphics, are automatically
provided after running Analyze.Binomial
(or Analyze.Poisson
).
Important: before using these functions, though, it is necessary to first run the
functions AnalyzeSetup.Binomial
(or AnalyzeSetup.Poisson
) once in order to set everything up for the sequential analysis.
CMaxSPRT for Poisson data with limited information from historical cohort
In Poisson MaxSPRT, the expected mean μ_t is assumed to be a known function reflecting the baseline adverse event risk in the absence of the exposure of interest. In practice, it is estimated with historical data and the uncertainty associated with the estimated counts may or may not have a nonnegligible impact on the performance of the sequential analysis method. Li and Kulldorff (Li and Kulldorff, 2010) showed in their simulation study that uncertainty in the estimated baseline means can be ignored when the total number of events in the historical data is at least 5 times the specified upper limit T. Otherwise, it is recommended to implement the Conditional Maximized Sequential Probabilit Ratio Test (CMaxSPRT) to account for variation in both the historical and surveillance cohorts.
Let c and V denote the total number of events and the cumulative persontime in the historical data, let P_k denote the cumulative persontime observed in the surveillance population when the kth event occurred. The CMaxSPRT statistic defined in terms of the log likelihood ratio is given by
U_k=clog(\frac{c(1+P_k/V)}{c+k})+klog(\frac{k(1+P_k/V)}{P_k/V(c+k)}),
when k/c>P_k/V, and U_k=0, otherwise. In the original publication (Li and Kulldorff, 2010), the method was introduced as a continuous sequential analytic approach with the upper limit defined in terms of the maximum number of observed events, i.e., k ≤q K, and the critical value calculated via a Monte Carlo approach. A large number of Monte Carlo simulations (e.g., 10 million) might be needed to calculate the critical values with a reasonable precision. In Silva et al. (2016), the method was extended i) with another option of defining the surveillance length in terms of the maximum cumulative persontime divided by the total cumulative persontime in the historical cohort, i.e., P_k/V ≤ T, ii) with an exact calculation of the critical values for both surveillance length definitions, and iii) for group sequential analysis with data updated and analyzed intermittently instead of continuously. The exact critical values are calculated using the interval havling method to solve for the root of a complex, nonlinear equation such that the overall Type I error rate is preserved at the nominal level. As K increases, the computing time for the exact critical values increases exponentially. Silva et al. (2016) also proposed two approximation methods to calculate the critical values that require substantially less computing time. One approch may overestimate the critical values and thus is referred to as the conservative approach as it may yield lowerthannominal Type I error rates; the other approach may underestmate the critical values and thus is referred to as the liberal approach as it may yield higherthannominal Type I error rates. The recommendation is to use the exact approach when K is small (e.g., 10), use the conservative approach when K is medium or large but c is small, and use the liberal approach when c is medium (e.g., 50) or large. Simulation results show that the three approaches yield very similar results when K and c are reasonably large.
Comparison with Other R Packages for Sequential Analysis
The R Sequential package is designed for sequential analysis where statistical hypothesis testing is performed using gradually accumulating data.
It is not designed for quality control problems, where a process is monitored over time to detect an emerging problem due to a sudden increase in the excess risk.
Although the methods for sequential analysis and quality control may seem similar, as they both analyze gradually accumulating data, they are actually very different
in both their purpose and design. Under the sequential hypothesis testing approach, the objective is to quickly determine if there is some intrinsic excess risk,
with the assumption that this risk does not change over time. For example, we may want to know if drug A is better than drug B, and there is no reason to believe
that the behavior of the drugs change over time. In the quality control setting, the objective is instead to detect a possible change in a stochastic process that
may occur in the future, and to detect that change as soon as possible after it occurs. For example, the heart of a hospital patient is beating as it should, but if
there is a sudden deterioration, the alarm should sound as soon as possible without generating a lot of false alarms. This package is only meant for sequential analysis
of the former type, and it should not be used for quality control type problems. For quality control type analyses, there are other R packages available,
such as graphicsQC
, IQCC
, MetaQC
, MSQC
, qcc
, and qcr
.
In a number of ways, the R Sequential package differs from other R packages for sequential analyses. Historically, most sequential analysis has been conducted
using asymptotic statistical theory, and that is also what is used in the gsDesign
, ldbounds
, PwrGSD
, seqDesign
, seqmon
, and sglr
R packages.
In contrast, the R Sequential package is based on exact results, using iterative numerical calculations, rather than using asymptotic theory or computer simulations.
With this package, it is only possible to analyze Poisson or binomial/Bernoulli data. For other probability distributions, such as normal or exponential data,
other R packages should be consulted, such as GroupSeq
or SPRT
. Moreover, all functions in this package uses a onesided upper bound to reject the null hypothesis,
while the analyses end without rejecting the null when an upper limit on the sample size is reached. For two sided sequential analysis, or other types of rejection
boundaries, other R packages must be used, such as e.g. ldbounds
and Binseqtest
. Finally, in this package, there are functions for both continuous
and group sequential analysis, and it is also possible to analyze situations where some of the data arrives continuously while other parts of the data arrives in groups.
Most other R packages are exclusively designed for group sequential analysis, but there are some that also do continuous sequential analysis, such as Binseqtest
and SPRT
,
but Binseqtest
is only for binomial data type, and SPRT
is for simple alternative hypothis, while Sequential
can be used for binomial and Poisson data and is meant to
composite alternative hypothesis. The present package offers the possibility to calculate the expected time to signal through the Performance.Poisson
, Performance.G.Poisson
,
Performance.G.Binomial
, and Performance.Binomial
functions, which is not offered by the other packages cited above.
Acknowledgements
Development of the R Sequential package has been funded and supported by:
 Food and Drug Administration, USA, through the MiniSentinel Project (v1.0,1.1,2.0).
 National Institute of General Medical Sciences, NIH, USA, through grant number R01GM108999 (v2.0).
 Federal University of Ouro Preto (UFOP), through contract under internal UFOP's resolution CEPE 4600 (v2.0).
 National Council of Scientific and Technological Development (CNPq), Brazil (v1.0).
 Bank for Development of the Minas Gerais State (BDMG), Brazil (v1.0).
Feedback from users is greatly appreciated. Very valuable suggestions concerning the R Sequential package have been received from various individuals, including:
 Ron Berman, University of California Berkeley.
 Claudia CoronelMoreno, Harvard Pilgrim Health Care Institute.
 Bruce Fireman, Kaiser Permanente Northern California.
 Josh Gagne, Harvard Medical School and Brigham and Women's Hospital.
 Ned Lewis, Kaiser Permanente Northern California.
 Judith Maro, Harvard Medical School and Harvard Pilgrim Health Care Institute.
 Azadeh Shoaibi, Food and Drug Administration.
 Katherine Yih, Harvard Medical School and Harvard Pilgrim Health Care Institute.
 Jie Tang, Clinical biostatistics, Janssen R and D US, Johnson and Johnson LLC.
 Tuomo A. Nieminen, The National Institute for Health and Welfare (THL), Finland.
 Andreia Leite, Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine.
Version History of the R Sequential Package
Version 1.1, February 2013
Exact sequential analysis for Poisson data:
 Exact continuous sequential analysis.
 Exact group sequential analysis with predefined and constant groups sizes.
 Wald type rejection boundary.
 Statistical power, expected time to signal and sample size calculations.
 User guide.
Version 1.2, January 2014
 Improved code structure and efficiency.
 More extensive user guide.
Version 2.0, June 2015
Exact sequential analysis for binomial data:
 Continuous sequential analysis.
 Group sequential analysis with predefined group sizes.
 Group sequential analysis with unpredictable group sizes, not specified a priori.
 Fixed or variable binomial probabilities (matching ratios).
 User specified alpha spending function.
 Statistical power, expected time to signal and sample size calculations.
 Updated user guide.
Version 2.0.1, June 2015
 Correction of bugs in CV.Poisson
function.
 Updated user guide.
Version 2.0.2, Octuber 2015
 Improved user guide.
Version 2.1, May 2016
Exact sequential analysis for Poisson data:
 Group sequential analysis with unpredictable group sizes, not specified a priori.
 User specified alpha spending function.
 Mixed groupcontinuous sequential analysis.
 Statistical power, expected time to signal and sample size calculations for nonconstant groups sizes.
Other:
 Directory address parameter in AnalyzeSetUp functions.
 Probability parameter in binomial functions.
 Updated user guide.
Version 2.1.1, June 2016
 Correction of bugs in Poisson functions.
 Updated user guide.
Version 2.2, July 2016
 Critical Value, Performance, and SampleSize calculations for CMaxSPRT with Poisson data.
 Updated user guide.
Version 2.2.1, September 2016
 Correction of bugs in CV.Poisson
and CV.G.Poisson
functions.
 Updated user guide.
Author(s)
Ivair Ramos Silva, Martin Kulldorff.
Maintainer: Ivair Ramos Silva <jamesivair@yahoo.com.br>
References
Chin R. (2012), Adaptive and Flexible Clinical Trials, Boca Raton, FL: Chapman and Hall/CRC.
Cook TD, DeMets DL. (2007), Introduction to Statistical Methods for Clinical Trials: Chapman and Hall/CRC Texts in Statistical Science.
Fireman B, et al. (2013) Exact sequential analysis for binomial data with timevarying probabilities. Manuscript in Preparation.
Friedman LM, Furberg CD, DeMets D. (2010), Fundamentals of Clinical Trials, 4th ed.: Springer.
Ghosh BK, Sen PK. (1991), Handbook of Sequential Analysis, New York: MARCEL DEKKER, Inc.
Ghosh M, Mukhopadhyay N, Sen PK. (2011), Sequential Estimation: Wiley.
Jennison C, Turnbull B. (2000), Group Sequential Methods with Applications to Clinical Trials, London: Chapman and Hall/CRC.
Kulldorff M, Davis RL, Kolczak M, Lewis E, Lieu T, Platt R. (2011). A Maximized Sequential Probability Ratio Test for Drug and Safety Surveillance. Sequential Analysis, 30: 58–78.
Kulldorff M, Silva IR. (2015). Continuous Postmarket Sequential Safety Surveillance with Minimum Events to Signal. arxiv:1503.01978 [stat.ap].
Mukhopadhyay N, Silva BM. (2002), Sequential Methods and Their Applications, 1th ed.: Chapman and Hall/CRC.
Silva IR, Kulldorff M. (2015), Continuous versus Group Sequential Analysis for Vaccine and Drug Safety Surveillance. Biometrics, 71 (3), 851–858.
Xia Qi. (2007), A Procedure for Group Sequential Comparative Poisson Trials. Journal of Biopharmaceutical Statistics, 17, 869–881.
Wald A. (1945), Sequential Tests of Statistical Hypotheses, Annals of Mathematical Statistics, 16, 117–186.
Wald A. (1947), Sequential Analysis. New York: John Wiley and Sons.
Whitehead J. (1997), The Design and Analysis of Sequential Clinical Trials, 2th ed.: Wiley.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  ## Critical value for continuous sequential analyses for Poisson Data.
## Maximum sample size = 10, alpha = 0.05 and minimum number of events = 3:
cvt< CV.Poisson(SampleSize=10,D=0,M=3,alpha=0.05)
## Statistical power and the expected time to signal for relative risk RR=2:
result< Performance.Poisson(SampleSize=10,D=0,M=3,cv=cvt,RR=2)
# And if you type:
result
# Then you will see the following:
# Power ESignalTime ESampleSize
# [1,] 0.7329625 4.071636 5.654732
