# Bernoulli_diff_stat: Compute the distribution of differences of replacement... In sigr: Succinct and Correct Statistical Summaries for Reports

## Description

Assuming `max(nA, nB) %% min(nA, nB) == 0`: compute the distribution of differences of weighted sums between `max(1, nB/nA)*sum(a)` and `max(1, nA/nB)*sum(b)` where `a` is a 0/1 vector of length `nA` with each item 1 with independent probability `(kA+kB)/(nA+nB)`, and `b` is a 0/1 vector of length `nB` with each item 1 with independent probability `(kA+kB)/(nA+nB)`. Then return the significance of a direct two-sided test that the absolute value of this difference is at least as large as the test_rate_difference (if supplied) or the empirically observed rate difference `abs(nB*kA - nA*kB)/(nA*nB)`. The idea is: under this scaling differences in success rates between the two processes are easily observed as differences in counts returned by the scaled processes. The method can be used to get the exact probability of a given difference under the null hypothesis that both the `A` and `B` processes have the same success rate `(kA+kB)/(nA+nB)`. When `nA` and `nB` don't divide evenly into to each other two calculations are run with the larger process is alternately padded and truncated to look like a larger or smaller experiment that meets the above conditions. This gives us a good range of significances.

## Usage

 `1` ```Bernoulli_diff_stat(kA, nA, kB, nB, test_rate_difference, common_rate) ```

## Arguments

 `kA` number of A successes observed. `nA` number of A experiments. `kB` number of B successes observed. `nB` number of B experiments. `test_rate_difference` numeric, difference in rate of A-B to test. Note: it is best to specify this prior to looking at the data. `common_rate` rate numeric, assumed null-rate.

## Details

Note the intent is that we are measuring the results of an A/B test with `max(nA, nB) %% min(nA, nB) == 0` (no padding needed), or `max(nA,nB) >> min(nA,nB)` (padding is small effect).

The idea of converting a rate problem into a counting problem follows from reading Wald's Sequential Analysis.

For very small p-values the calculation is sensitive to rounding in the observed ratio-difference, as an arbitrarily small change in test-rate can move an entire set of observed differences in or out of the significance calculation.

## Value

Bernoulli difference test statistic.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10``` ```Bernoulli_diff_stat(2000, 5000, 100, 200) Bernoulli_diff_stat(2000, 5000, 100, 200, 0.1) Bernoulli_diff_stat(2000, 5000, 100, 199) Bernoulli_diff_stat(2000, 5000, 100, 199, 0.1) Bernoulli_diff_stat(100, 200, 2000, 5000) # sigr adjusts experiment sizes when lengths # don't divide into each other. Bernoulli_diff_stat(100, 199, 2000, 5000) Bernoulli_diff_stat(100, 199, 2000, 5000)\$pValue ```

sigr documentation built on June 12, 2021, 9:07 a.m.