q_functions: Q-values of Scan Statistics Hypothesis Test

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function returns the scan statistics q-value which is approximated by 1-dependent stationary sequences.

Usage

1
2
3
4
q1_function(k,m,p)
q2_function(k,m,p)
qL_function(L,k,m,p)
prob_fun(N,k,m,p)

Arguments

k

scan statistics quantile.

m

scan statistics window length.

p

success probabiliy for each Bernoulli trial under null hypothesis.

L

number of 1-dependent stationary sequences.

N

number of Bernoulli trials.

Details

It has been proved that calculate excat scan statistics probability is a NP hard problem. So to apply scan statistics, people either use an approximated distribution or Monte Carlo simulation. Most of the time, approximated distribution is more efficient but has restrictions on input parameters. q1_function, q2_function and qL_function here are estimated by using 1-dependent stationary sequences.

L and N represent two different measurement even though those are all related to the length of scanned sequence.

L = N/m

while N is the length of sequence.

For more details of the math works, please read Haiman's paper shown in references.

k, m, L, N are all integers, where m > 0, 0 ≤ k ≤ m, L > 3 and N/m > 5.
p is a real number where 0 < p ≤ 1.

Value

q1_function, q2_function and qL_function return the approximated q-value.

prob_fun returns the linear interpolation of qL_function results.

Author(s)

Zhicong Zhao

References

Haiman,G.(2007), Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary sequences. Journal of Statistical Planning and Inference, 137(2007), 821-828.

See Also

prob_fun_mc which approximate q-value by Monte Carlo simulations.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
q1_function(5,10,0.2)

## compare Monte Carlo simulations and 1-dependent stationary sequences ##
set.seed(100)
p_MC <- mapply(prob_fun_mc,c(100,500,1000,5000,10000),
               MoreArgs = list(N = 1000,k = 5,m = 10,p=0.1))
p_Haiman <- prob_fun(1000,5,10,0.1)
plot(c(100,500,1000,5000,10000), p_MC,col = "blue",type = "l",
     xlab = "MC replicate times",ylab = "q_values",ylim = c(0.9,1))
abline(h = p_Haiman, col = "red")
legend("topleft",legend = c("MC","Haiman"),col = c("blue","red"),lty = 1)

zhicongz/AnomDetct documentation built on Dec. 12, 2019, 9:16 a.m.