# compute.stream: Calculates point of degeneration j0 into noise of the Idata,... In TopKLists: Inference, Aggregation and Visualization for Top-K Ranked Lists

## Description

The estimation of \hat{j}_0 is achieved via a moderate deviation-based approach. The probability that an estimator, computed from a pilot sample size ν, exceeds a value z, the deviation above z is said to be a moderate deviation if its associated probability is polynomially small as a function of ν, and to be a large deviation if the probability is exponentially small in ν. The values of z=z_ν that are associated with moderate deviations are z_ν\equiv\bigl(C\,ν^{-1}\,\logν\bigr)^{1/2}, where C>\frac{1}{4}. The null hypothesis that p_k=\frac{1}{2} for ν consecutive values of k, versus the alternative hypothesis that p_k>\frac{1}{2} for at least one of the values of k, is rejected when \hat{p}_j^\pm-\frac{1}{2}>z_ν. The probabilities \hat{p}_j^+ and \hat{p}_j^- are estimates of p_j computed from the ν data pairs I_\ell for which \ell lies immediately to the right of j, or immediately to the left of j, respectively.

The iterative algorithm consists of an ordered sequence of "test stages" s_1, s_2,… In stage s_k an integer J_{s_k} is estimated, which is a potential lower bound to j_0 (when k is odd), or a potential upper bound to j_0 (when k is even).

## Usage

 1 compute.stream(Idata, const=0.251, v, r=1.2) 

## Arguments

 Idata Input data is a vector of 0s and 1s (see prepare.idata) const Denotes the constant C of the moderate deviation bound, needs to be larger than 0.25 (default is 0.251) v Denotes the pilot sample size ν related to the degree of randomness in the assignments. In each step the noise is estimated from the Idata as probability of 1 within the interval of size ν, moving from J_{s_{k-1}} -r ν if k is odd or J_{s_{k-1}} +r ν if k is even, until convergence or break (see r) r Denotes a technical constant determining the starting point from which the probability for I=1 is estimated in a window of size v (see v, default is 1.2)

## Value

A named list containing:

 j0_est Is the estimated index for which the Idata degenerate into noise k k=j0_est-1 reason.break The reason why the computation has ended - convergence or break condition js Is the sequence of estimated j_0 in each iteration run, also showing the convergence behaviour v Is the preselected value of the parameter ν

## Author(s)

Eva Budinska <[email protected]>, Michael G. Schimek <[email protected]>

prepare.idata
 1 2 3 4 5 set.seed(465) myhead <- rbinom(20, 1, 0.8) mytail <- rbinom(20, 1, 0.5) mydata <- c(myhead, mytail) compute.stream(mydata, v=10)