AnomalyLikelihoodScorer: Anomaly Likelihood Scorer
In alaineiturria/otsad: Online Time Series Anomaly Detectors

AnomalyLikelihoodScorer

R Documentation

Anomaly Likelihood Scorer

Description

R6 class that implements the anomaly likelihood introduced by Ahmad et al. The original source code is available at https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py. This class analyzes and estimates the distribution of averaged anomaly scores from a given model. Given a new anomaly score s, estimates P(score >= s). The number P(score >= s) represents the likelihood of the current state of predictability. For example, a likelihood of 0.01 or 1 is not as unusual as it seems. For records that arrive every minute, this means once every hour and 40 minutes. A likelihood of 0.0001 or 0.01 about once every 7 days.

Methods

Method `new()`

Create a new AnomalyLikelihoodScorer object.

Usage

AnomalyLikelihoodScorer$new(
  learningPeriod = 288,
  estimationSamples = 100,
  historicWindowSize = 8640,
  reestimationPeriod = 100,
  computeLogLikelihood = F
)

Arguments

learningPeriod: Number of iterations required for the algorithm to learn the basic patterns in the dataset and for the anomaly score to 'settle down'. The default is based on empirical observations but in reality this could be larger for more complex domains. The downside if this is too large is that real anomalies might get ignored and not flagged.
estimationSamples: Number of reasonable anomaly scores required for the initial estimate of the Gaussian. The default of 100 records is reasonable - we just need sufficient samples to get a decent estimate for the Gaussian. It's unlikely you will need to tune this since the Gaussian is re-estimated every 10 iterations by default.
historicWindowSize: Size of sliding window of historical data points to maintain for periodic reestimation of the Gaussian. Note: the default of 8640 is based on a month's worth of history at 5-minute intervals.
reestimationPeriod: How often we re-estimate the Gaussian distribution. The ideal is to re-estimate every iteration but this is a performance hit. In general the system is not very sensitive to this number as long as it is small relative to the total number of records processed.
computeLogLikelihood: If TRUE compute a log scale representation of the likelihood value. Since thelikelihood computations return low probabilities that often go into four 9's or five 9's, a log value is more useful for visualization, thresholding, etc.

Method `computeScore()`

Compute the probability that the current value plus anomaly score represents an anomaly given the historical distribution of anomaly scores. The closer the number is to 1, the higher the chance it is an anomaly.

Usage

AnomalyLikelihoodScorer$computeScore(x, value)

Arguments

x: The current raw anomaly score.
value: The current ("raw") input value.

Returns

The anomalyLikelihood for this record.

References

S. Ahmad, A. Lavin, S. Purdy, Z. Agha, Unsupervised real-time anomaly detection for streaming data, Neurocomputing 262 (2017) 134-147.

Examples

a <- rnorm(500)

scorer <- AnomalyLikelihoodScorer$new(
   learningPeriod = 10,
   estimationSamples = 10,
   historicWindowSize = 20,
   reestimationPeriod = 10
)

sapply(a, function(x) {scorer$computeScore(x, x)})

alaineiturria/otsad documentation built on Jan. 12, 2023, 12:26 p.m.

alaineiturria/otsad index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

alaineiturria/otsad
Online Time Series Anomaly Detectors

AnomalyLikelihoodScorer: Anomaly Likelihood Scorer
In alaineiturria/otsad: Online Time Series Anomaly Detectors