ultimate_detct: Anomaly Detection Via Scan Statistics

Description Usage Arguments Details Value Author(s) Examples

View source: R/ultimate_detct.R

Description

This function is an assemble function that uses other functions in this package, an ultimated version of this detection method.

Usage

1
2
3
4
5
6
7
8
9
ultimate_detct(x, theta_th=1, theta_0 = theta_th,
               alpha_lvl = 0.05, anom_est_alpha_lvl = 0.05,
               dist_null = NA, ..., HRR_kernel = "triangular",
               hazard_bandwidth = 0.1, knn = NULL, est_fun = "pt",
               n_hz_sample = NULL, n_hz_size = NULL,
               pt_int = seq(0,1,by = 0.05), window_lth = NA,
               seq_theta = seq(0.5, 1, by = 0.05)*theta_0,
               x_unit = 0.01, plot_unit = 1, MLE_unit = 1,
               plt_mgn = 0)

Arguments

x

A numeric vector of data values where is hypothesis test is applied on.

theta_th

Initial theoretical theta value of hypothesis test. Needs to be positive.

theta_0

Initial real theta value of hypothesis test. Default value is same as theta_th.

alpha_lvl

Significant level for the hypothesis test with Initial theoretical theta value theta_th.

anom_est_alpha_lvl

Significant level for cluster quantity estimation.

dist_null

A character string giving the underlying distribution in null hypothesis. Distribution options are shown in details.

...

Further arguments for distribution parameters.

HRR_kernel

A character string giving the smoothing kernel to be used in HRR_pt_est or HRR_sbsp_est. This must partially match one of "gaussian", "rectangular", "triangular" or "knn". Default is "triangular".

hazard_bandwidth

the smoothing bandwidth to be used.

knn

number of neighbor points to be considered in smoothing for the "knn" kernel.

est_fun

A character string giving the hazard rate ratio estimation function. This must match with either "pt" or "sbsp". Default is "pt".

n_hz_sample

Number of replicates if est_fun is "sbsp".

n_hz_size

Resampled size if est_fun is "sbsp".

pt_int

A vector of hazard rate ratio estimated points.

window_lth

Window length for scan statistics hypothesis test. If missing, window length is selected by MLE_window_lth and Haiman_window_lth.

seq_theta

A vector of theta values put in hypo_test for cluster detection. This sequence of theta needs to be in order. Default is seq(0.5, 1, by = 0.05)*theta_0/theta_th

x_unit

A number indicating the uniformization bin width.

plot_unit

A number indicating bin width for histogram in the plot.

MLE_unit

A number indicating the bin width for counting excess.

plt_mgn

Extra margin of clusters shown in plot.

Details

This function is an ultimated version of this detection method. All the parameters in this function have default values except x. This means as long as people give the data to this function, it can automatically detect the embedded clusters in the data, without specify underlying distributions, number of clusters, location of clusters or any neccssary parameters that commonly need to be put in the model.

Instead of setting theta_0 same to be theta_th, people can also multiply theta_th with returns of HRR_bstp_lb to remove the potential false positive clusters come from the bias of success probability estimation.

Value

This function returns a list with components:

Total

Estimated quantity of clusters

Cluster

A matrix where first two columns are boundaries of clusters and thire column is the corresponding p-value. Notice that clusters are not necessary to be exclusive.

plot

The plot.

Author(s)

Zhicong Zhao

Examples

1
2
3
4
set.seed(100);x <- c(rgamma(5000,2,0.05),rnorm(200,50,1)) ## generate data
res <- ultimate_detct(x, HRR_kernel = "gaussian", est_fun = "sbsp",
                      n_hz_sample = 30, n_hz_size = 80, MLE_unit = 5,
                      x_unit = 0.001)

zhicongz/AnomDetct documentation built on Dec. 12, 2019, 9:16 a.m.