Sequential t-test"

knitr::opts_chunk$set(
  collapse = TRUE
)

Overview

The sprtt package is the implementation of sequential probability ratio tests using the associated t-statistic (sprtt). This vignette describes the theoretical background of these tests.

Other recommended vignettes cover:

Sequential t-Test

What is a sequential test procedure?

With a sequential approach, data is continuously collected and an analysis is performed after each data point, which can lead to three different results [@wald1945a]:

Basically it is not necessary to perform an analysis after each data point --- several data points can also be added at once. However, this affects the sample size (N) and the error rates [@schnuerch2020a].

The efficiency of sequential designs has already been examined. Reductions in the sample by 50% and more were found in comparison to analyses with fixed sample sizes [@wald1945a; @schnuerch2020a]. Sequential hypothesis testing is therefore particularly suitable when resources are limited because the required sample size is reduced without compromising predefined error probabilities.

What is the sequential t-test?

The sequential t-test is based on the Sequential Probability Ratio Test (SPRT) by Abraham @wald1947, which is a highly efficient sequential hypothesis test. However, the usage of WaldΒ΄s SPRT is limited in the case of normally distributed data, because the variance has to be known or specified in the hypothesis. Rushton [-@rushton1950a; -@rushton1952] and @hajnal1961 have further developed the SPRT using the t-statistic. The basic idea is to transform the sequence of observations (which is dependent on the variance) into a sequence of the associated t-statistic (which is independent of the variance).

In the SPRT the null and alternative hypotheses are defined as follows, with πœƒ representing the model parameter :

$$ H_0:\ πœƒ\ =\ πœƒ_0 \ H_1:\ πœƒ\ =\ πœƒ_1 $$

The test statistic of the SPRT is based on a likelihood ratio, which is a measure of the relative evidence in the data for the given hypotheses. More specifically, it is the ratio of the likelihood of the alternative hypothesis to the likelihood of the null hypothesis at the m-th step of the sampling process (LR~m~).

$$ LR_{m} = \frac {f(data_m | H_1)} {f(data_m | H_0)} = \frac {𝑓(x_1,...,x_m | πœƒ_1)} {𝑓(x_1,...,x_m | πœƒ_0)} $$

Before the transformation into the t-statistic, the model parameter πœƒ contains the parameters of a normal distribution: the mean (Β΅) and the standard deviation (𝜎). Therefore, the Wald SPRT requires prior knowledge about the variance (𝜎^2^) or a specification in the hypotheses.

After the transformation of the observed values into the associated t-statistic, the model parameter πœƒ contains the parameters of the non-central t-distribution: the degrees of freedom (df) and the non-centrality parameter (π›₯).

$$ {𝑓(x_1,...,x_m | Β΅,𝜎)} => {𝑓(t_2,...,t_m | df,π›₯)} $$

For the calculation of the degrees of freedom, only the sample size of the group(s) is needed. The non-centrality parameter also requires a specification of the expected effect size in form of Cohen`s d (d).

To eventually calculate the LR of the sequential t-test, only the current t~m~-statistic is necessary. @rushton1950a demonstrated that an SPRT can be performed by simply considering the ratio of probability densities for the most recent t~m~ statistic under the alternative and null hypothesis at any m-th stage. Thus, the test statistic for a one and two-sided sequential t-test can be calculated as follows:

$$ LR_{m,\ one-sided\ sequential\ t-test} = \frac {𝑓(t_m | πœƒ_1)} {𝑓(t_m | πœƒ_0)} \ LR_{m,\ two-sided\ sequential\ t-test} = \frac {𝑓(t_m^2 | πœƒ_1)} {𝑓(t_m^2 | πœƒ_0)}. $$

To account for the fact that the algebraic sign is unknown in a two-sided test, the t-value is squared [@rushton1952].

After the calculation of the test statistic, the decision will be either to continue sampling or to terminate the sampling and accept one of the hypotheses. @wald1945a defined the following rules for the SPRT:

| Condition | Decision | |:---------------:|----------------------------:| | LR~m~ ≀ B | accept H~0~ and reject H~1~ | | B \< LR~m~ \< A | continue sampling | | LR~m~ ≀ A | accept H~1~ and reject H~0~ |

The A and B boundaries are calculated with the previously defined error rates 𝛼 (Type I error) and 𝛽 (Type II error) as follows:

$$ A = \left( \frac{1 - 𝛽}{𝛼} \right) \ B = \left( \frac{𝛽}{1 - 𝛼} \right). $$

In summary, three specifications are required to calculate a sequential t-test:

References



Try the sprtt package in your browser

Any scripts or data that you put into this service are public.

sprtt documentation built on Aug. 7, 2021, 1:06 a.m.