MLE_window_lth: Scan Statistics Empirical Window Length
In zhicongz/AnomDetct: Anomaly Detection via Scan Statistics

Description Usage Arguments Details Value Note Author(s) Examples

This funtion returns the optimized scan statistics window length via the empirical estimation.

1	MLE_window_lth(x,dist_null,..., unit = 1)

`x`	a numeric vector of data values.
`dist_null`	a character string giving the underlying distribution in null hypothesis. Distribution options are shown in details.
`...`	Further arguments for distribution parameters.
`unit`	A number indicating the bin width for counting excess.

Before applying scan statistics, the window length need to be setted first and this is an important factor which determines the hypothesis test performance. In practice, window length should be close to the cluster size. Too small window length leads to higher false positive while too large window length leads to lower test power.

This function is for efficiently select an appropriate window length. The data is splited by unit and in each group the excess is defined as number of observations subtract with expected observations. Then, the maximum excess among those groups is the returned value.

The dist_null indicates the underlying distribution class. The options follow the distributions regular abbreviation in R, Like norm is normal distribution, unif is uniform distribution, gpd is generalized pareto distribution. Distributions for more distribution options.

The empirical scan statistics window length is returned.

To use gpd, package POT https://cran.r-project.org/package=POT needs to be installed first.

Zhicong Zhao