View source: R/oip_tssd_ewma.R
OipTsSdEwma | R Documentation |
OipTsSdEwma
is the optimized implementation of the
IpTsSdEwma
function using environmental variables. This function
allows the calculation of anomalies using TSSD-EWMA in an incremental
processing mode. It has been shown that in long datasets it can reduce
runtime by up to 50%. This algorithm is a novel method for covariate
shift-detection tests based on a two-stage structure for univariate
time-series. TSSD-EWMA works in two phases. In the first phase, it detects
anomalies using the SD-EWMA CpSdEwma
algorithm. In the second
phase, it checks the veracity of the anomalies using the Kolmogorov-Simirnov
test to reduce false alarms.
OipTsSdEwma( data, n.train, threshold, l = 3, m = 5, to.next.iteration = list(last.res = NULL, to.check = NULL, last.m = NULL) )
data |
Numerical vector with training and test dataset. |
n.train |
Number of points of the dataset that correspond to the training set. |
threshold |
Error smoothing constant. |
l |
Control limit multiplier. |
m |
Length of the subsequences for applying the Kolmogorov-Smirnov test. |
to.next.iteration |
list with the necessary parameters to execute in the next iteration |
data
must be a numerical vector without NA values.
threshold
must be a numeric value between 0 and 1.
It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is
used. Finally, l
is the parameter that determines the control limits.
By default, 3 is used. m
is the length of the subsequences for
applying the Kolmogorov-Smirnov test. By default, 5 is used. It should be
noted that the last m values have not been verified because you need other m
values to be able to perform the verification. Finally
to.next.iteration
is the last result returned by some previous
execution of this algorithm. The first time the algorithm is executed its
value is NULL. However, to run a new batch of data without having to include
it in the old dataset and restart the process, the two parameters returned by
the last run are only needed.
A list of the following items.
result |
Dataset conformed by the following columns: |
is.anomaly
1 if the value is anomalous 0 otherwise.
ucl
Upper control limit.
lcl
Lower control limit.
i
row id or index
last.data.checked |
Data frame with checked anomalies. |
to.next.iteration |
Last result returned by the algorithm. It is a list containing the following items. |
last.res
Last result returned by the aplicaction of
SD-EWMA function with the calculations of the parameters of the last run
. These are necessary for the next run.
to.check
Subsequence of the last remaining unchecked
values to be checked in the next iterations.
last.m
Subsequence of the last m values.
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
## EXAMPLE 1: ---------------------- ## It can be used in the same way as with OcpTsSdEwma passing the whole dataset ## as an argument. ## Generate data set.seed(100) n <- 200 x <- sample(1:100, n, replace = TRUE) x[70:90] <- sample(110:115, 21, replace = TRUE) x[25] <- 200 x[150] <- 170 df <- data.frame(timestamp = 1:n, value = x) ## Calculate anomalies result <- OipTsSdEwma( data = df$value, n.train = 5, threshold = 0.01, l = 3, m = 20, to.next.iteration = NULL ) res <- cbind(df, result$result) ## Plot results PlotDetections(res, print.time.window = FALSE, title = "TSSD-EWMA ANOMALY DETECTOR") ## EXAMPLE 2: ---------------------- ## You can use it in an incremental way. This is an example using the stream ## library. This library allows the simulation of streaming operation. # install.packages("stream") library("stream") ## Generate data set.seed(100) n <- 500 x <- sample(1:100, n, replace = TRUE) x[70:90] <- sample(110:115, 21, replace = TRUE) x[25] <- 200 x[320] <- 170 df <- data.frame(timestamp = 1:n, value = x) dsd_df <- DSD_Memory(df) ## Initialize parameters for the loop last.res <- NULL res <- NULL nread <- 50 numIter <- n%/%nread m <- 20 dsd_df <- DSD_Memory(df) ## Calculate anomalies for(i in 1:numIter) { # read new data newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore") # calculate if it's an anomaly last.res <- OipTsSdEwma( data = newRow$value, n.train = 5, threshold = 0.01, l = 3, m = 20, to.next.iteration = last.res$to.next.iteration ) # prepare result res <- rbind(res, cbind(newRow, last.res$result)) if (!is.null(last.res$last.data.checked)) { res[res$i %in% last.res$last.data.checked$i, "is.anomaly"] <- last.res$last.data.checked$is.anomaly } } ## Plot results PlotDetections(res, title = "TSSD-EWMA ANOMALY DETECTOR")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.