q_functions: Q-values of Scan Statistics Hypothesis Test
In zhicongz/AnomDetct: Anomaly Detection via Scan Statistics

Description Usage Arguments Details Value Author(s) References See Also Examples

This function returns the scan statistics q-value which is approximated by 1-dependent stationary sequences.

q1_function(k,m,p)
q2_function(k,m,p)
qL_function(L,k,m,p)
prob_fun(N,k,m,p)

`k`	scan statistics quantile.
`m`	scan statistics window length.
`p`	success probabiliy for each Bernoulli trial under null hypothesis.
`L`	number of 1-dependent stationary sequences.
`N`	number of Bernoulli trials.

It has been proved that calculate excat scan statistics probability is a NP hard problem. So to apply scan statistics, people either use an approximated distribution or Monte Carlo simulation. Most of the time, approximated distribution is more efficient but has restrictions on input parameters. q1_function, q2_function and qL_function here are estimated by using 1-dependent stationary sequences.

L and N represent two different measurement even though those are all related to the length of scanned sequence.

L = N/m

while N is the length of sequence.

For more details of the math works, please read Haiman's paper shown in references.

k, m, L, N are all integers, where m > 0, 0 ≤ k ≤ m, L > 3 and N/m > 5.
p is a real number where 0 < p ≤ 1.

q1_function, q2_function and qL_function return the approximated q-value.

prob_fun returns the linear interpolation of qL_function results.

Zhicong Zhao

Haiman,G.(2007), Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary sequences. Journal of Statistical Planning and Inference, 137(2007), 821-828.

prob_fun_mc which approximate q-value by Monte Carlo simulations.

q1_function(5,10,0.2)

## compare Monte Carlo simulations and 1-dependent stationary sequences ##
set.seed(100)
p_MC <- mapply(prob_fun_mc,c(100,500,1000,5000,10000),
               MoreArgs = list(N = 1000,k = 5,m = 10,p=0.1))
p_Haiman <- prob_fun(1000,5,10,0.1)
plot(c(100,500,1000,5000,10000), p_MC,col = "blue",type = "l",
     xlab = "MC replicate times",ylab = "q_values",ylim = c(0.9,1))
abline(h = p_Haiman, col = "red")
legend("topleft",legend = c("MC","Haiman"),col = c("blue","red"),lty = 1)