OipTsSdEwma: Optimized Incremental Processing Two-Stage Shift-Detection...

View source: R/oip_tssd_ewma.R

OipTsSdEwmaR Documentation

Optimized Incremental Processing Two-Stage Shift-Detection based on EWMA

Description

OipTsSdEwma is the optimized implementation of the IpTsSdEwma function using environmental variables. This function allows the calculation of anomalies using TSSD-EWMA in an incremental processing mode. It has been shown that in long datasets it can reduce runtime by up to 50%. This algorithm is a novel method for covariate shift-detection tests based on a two-stage structure for univariate time-series. TSSD-EWMA works in two phases. In the first phase, it detects anomalies using the SD-EWMA CpSdEwma algorithm. In the second phase, it checks the veracity of the anomalies using the Kolmogorov-Simirnov test to reduce false alarms.

Usage

OipTsSdEwma(
  data,
  n.train,
  threshold,
  l = 3,
  m = 5,
  to.next.iteration = list(last.res = NULL, to.check = NULL, last.m = NULL)
)

Arguments

data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Error smoothing constant.

l

Control limit multiplier.

m

Length of the subsequences for applying the Kolmogorov-Smirnov test.

to.next.iteration

list with the necessary parameters to execute in the next iteration

Details

data must be a numerical vector without NA values. threshold must be a numeric value between 0 and 1. It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is used. Finally, l is the parameter that determines the control limits. By default, 3 is used. m is the length of the subsequences for applying the Kolmogorov-Smirnov test. By default, 5 is used. It should be noted that the last m values have not been verified because you need other m values to be able to perform the verification. Finally to.next.iteration is the last result returned by some previous execution of this algorithm. The first time the algorithm is executed its value is NULL. However, to run a new batch of data without having to include it in the old dataset and restart the process, the two parameters returned by the last run are only needed.

Value

A list of the following items.

result

Dataset conformed by the following columns:

  • is.anomaly 1 if the value is anomalous 0 otherwise.

  • ucl Upper control limit.

  • lcl Lower control limit.

  • i row id or index

last.data.checked

Data frame with checked anomalies. i column is the id or index and is.anomaly is its new is.anomaly value.

to.next.iteration

Last result returned by the algorithm. It is a list containing the following items.

  • last.res Last result returned by the aplicaction of SD-EWMA function with the calculations of the parameters of the last run . These are necessary for the next run.

  • to.check Subsequence of the last remaining unchecked values to be checked in the next iterations.

  • last.m Subsequence of the last m values.

References

Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.

Examples

## EXAMPLE 1: ----------------------
## It can be used in the same way as with OcpTsSdEwma passing the whole dataset
## as an argument.

## Generate data
set.seed(100)
n <- 200
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[150] <- 170
df <- data.frame(timestamp = 1:n, value = x)

## Calculate anomalies
result <- OipTsSdEwma(
  data = df$value,
  n.train = 5,
  threshold = 0.01,
  l = 3,
  m = 20,
  to.next.iteration = NULL
)
res <- cbind(df, result$result)

## Plot results
PlotDetections(res, print.time.window = FALSE, title = "TSSD-EWMA ANOMALY DETECTOR")

## EXAMPLE 2: ----------------------
## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.

# install.packages("stream")
library("stream")


## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp = 1:n, value = x)
dsd_df <- DSD_Memory(df)

## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 50
numIter <- n%/%nread
m <- 20
dsd_df <- DSD_Memory(df)

## Calculate anomalies
for(i in 1:numIter) {
  # read new data
  newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
  # calculate if it's an anomaly
  last.res <- OipTsSdEwma(
    data = newRow$value,
    n.train = 5,
    threshold = 0.01,
    l = 3,
    m = 20,
    to.next.iteration = last.res$to.next.iteration
  )
  # prepare result
  res <- rbind(res, cbind(newRow, last.res$result))
  if (!is.null(last.res$last.data.checked)) {
    res[res$i %in% last.res$last.data.checked$i, "is.anomaly"] <-
      last.res$last.data.checked$is.anomaly
  }
}

## Plot results
PlotDetections(res, title = "TSSD-EWMA ANOMALY DETECTOR")


alaineiturria/otsad documentation built on Jan. 12, 2023, 12:26 p.m.