AnomalyLikelihoodScorer: Anomaly Likelihood Scorer

AnomalyLikelihoodScorerR Documentation

Anomaly Likelihood Scorer

Description

R6 class that implements the anomaly likelihood introduced by Ahmad et al. The original source code is available at https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py. This class analyzes and estimates the distribution of averaged anomaly scores from a given model. Given a new anomaly score s, estimates P(score >= s). The number P(score >= s) represents the likelihood of the current state of predictability. For example, a likelihood of 0.01 or 1 is not as unusual as it seems. For records that arrive every minute, this means once every hour and 40 minutes. A likelihood of 0.0001 or 0.01 about once every 7 days.

Methods

Public methods


Method new()

Create a new AnomalyLikelihoodScorer object.

Usage
AnomalyLikelihoodScorer$new(
  learningPeriod = 288,
  estimationSamples = 100,
  historicWindowSize = 8640,
  reestimationPeriod = 100,
  computeLogLikelihood = F
)
Arguments
learningPeriod

Number of iterations required for the algorithm to learn the basic patterns in the dataset and for the anomaly score to 'settle down'. The default is based on empirical observations but in reality this could be larger for more complex domains. The downside if this is too large is that real anomalies might get ignored and not flagged.

estimationSamples

Number of reasonable anomaly scores required for the initial estimate of the Gaussian. The default of 100 records is reasonable - we just need sufficient samples to get a decent estimate for the Gaussian. It's unlikely you will need to tune this since the Gaussian is re-estimated every 10 iterations by default.

historicWindowSize

Size of sliding window of historical data points to maintain for periodic reestimation of the Gaussian. Note: the default of 8640 is based on a month's worth of history at 5-minute intervals.

reestimationPeriod

How often we re-estimate the Gaussian distribution. The ideal is to re-estimate every iteration but this is a performance hit. In general the system is not very sensitive to this number as long as it is small relative to the total number of records processed.

computeLogLikelihood

If TRUE compute a log scale representation of the likelihood value. Since thelikelihood computations return low probabilities that often go into four 9's or five 9's, a log value is more useful for visualization, thresholding, etc.


Method computeScore()

Compute the probability that the current value plus anomaly score represents an anomaly given the historical distribution of anomaly scores. The closer the number is to 1, the higher the chance it is an anomaly.

Usage
AnomalyLikelihoodScorer$computeScore(x, value)
Arguments
x

The current raw anomaly score.

value

The current ("raw") input value.

Returns

The anomalyLikelihood for this record.

References

S. Ahmad, A. Lavin, S. Purdy, Z. Agha, Unsupervised real-time anomaly detection for streaming data, Neurocomputing 262 (2017) 134-147.

Examples

a <- rnorm(500)

scorer <- AnomalyLikelihoodScorer$new(
   learningPeriod = 10,
   estimationSamples = 10,
   historicWindowSize = 20,
   reestimationPeriod = 10
)

sapply(a, function(x) {scorer$computeScore(x, x)})


alaineiturria/otsad documentation built on Jan. 12, 2023, 12:26 p.m.