ultimate_detct: Anomaly Detection Via Scan Statistics
In zhicongz/AnomDetct: Anomaly Detection via Scan Statistics

Description Usage Arguments Details Value Author(s) Examples

This function is an assemble function that uses other functions in this package, an ultimated version of this detection method.

ultimate_detct(x, theta_th=1, theta_0 = theta_th,
               alpha_lvl = 0.05, anom_est_alpha_lvl = 0.05,
               dist_null = NA, ..., HRR_kernel = "triangular",
               hazard_bandwidth = 0.1, knn = NULL, est_fun = "pt",
               n_hz_sample = NULL, n_hz_size = NULL,
               pt_int = seq(0,1,by = 0.05), window_lth = NA,
               seq_theta = seq(0.5, 1, by = 0.05)*theta_0,
               x_unit = 0.01, plot_unit = 1, MLE_unit = 1,
               plt_mgn = 0)

`x`	A numeric vector of data values where is hypothesis test is applied on.
`theta_th`	Initial theoretical theta value of hypothesis test. Needs to be positive.
`theta_0`	Initial real theta value of hypothesis test. Default value is same as `theta_th`.
`alpha_lvl`	Significant level for the hypothesis test with Initial theoretical theta value `theta_th`.
`anom_est_alpha_lvl`	Significant level for cluster quantity estimation.
`dist_null`	A character string giving the underlying distribution in null hypothesis. Distribution options are shown in details.
`...`	Further arguments for distribution parameters.
`HRR_kernel`	A character string giving the smoothing kernel to be used in `HRR_pt_est` or `HRR_sbsp_est`. This must partially match one of "`gaussian`", "`rectangular`", "`triangular`" or "`knn`". Default is "`triangular`".
`hazard_bandwidth`	the smoothing bandwidth to be used.
`knn`	number of neighbor points to be considered in smoothing for the "`knn`" kernel.
`est_fun`	A character string giving the hazard rate ratio estimation function. This must match with either "`pt`" or "`sbsp`". Default is "`pt`".
`n_hz_sample`	Number of replicates if `est_fun` is "`sbsp`".
`n_hz_size`	Resampled size if `est_fun` is "`sbsp`".
`pt_int`	A vector of hazard rate ratio estimated points.
`window_lth`	Window length for scan statistics hypothesis test. If missing, window length is selected by `MLE_window_lth` and `Haiman_window_lth`.
`seq_theta`	A vector of theta values put in `hypo_test` for cluster detection. This sequence of theta needs to be in order. Default is seq(0.5, 1, by = 0.05)theta_0/theta_th*
`x_unit`	A number indicating the uniformization bin width.
`plot_unit`	A number indicating bin width for histogram in the plot.
`MLE_unit`	A number indicating the bin width for counting excess.
`plt_mgn`	Extra margin of clusters shown in plot.

This function is an ultimated version of this detection method. All the parameters in this function have default values except x. This means as long as people give the data to this function, it can automatically detect the embedded clusters in the data, without specify underlying distributions, number of clusters, location of clusters or any neccssary parameters that commonly need to be put in the model.

Instead of setting theta_0 same to be theta_th, people can also multiply theta_th with returns of HRR_bstp_lb to remove the potential false positive clusters come from the bias of success probability estimation.

This function returns a list with components:

`Total`	Estimated quantity of clusters
`Cluster`	A matrix where first two columns are boundaries of clusters and thire column is the corresponding p-value. Notice that clusters are not necessary to be exclusive.
`plot`	The plot.

Zhicong Zhao

set.seed(100);x <- c(rgamma(5000,2,0.05),rnorm(200,50,1)) ## generate data
res <- ultimate_detct(x, HRR_kernel = "gaussian", est_fun = "sbsp",
                      n_hz_sample = 30, n_hz_size = 80, MLE_unit = 5,
                      x_unit = 0.001)