AnomalyLikelihoodScorer | R Documentation |
R6 class that implements the anomaly likelihood introduced by Ahmad et al. The original
source code is available at
https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py. This class
analyzes and estimates the distribution of averaged anomaly scores from a given model. Given a
new anomaly score s
, estimates P(score >= s)
. The number P(score >= s)
represents the likelihood of the current state of predictability. For example, a likelihood of
0.01 or 1
is not as unusual as it seems. For records that arrive every minute, this means once every hour
and 40 minutes. A likelihood of 0.0001 or 0.01
about once every 7 days.
new()
Create a new AnomalyLikelihoodScorer object.
AnomalyLikelihoodScorer$new( learningPeriod = 288, estimationSamples = 100, historicWindowSize = 8640, reestimationPeriod = 100, computeLogLikelihood = F )
learningPeriod
Number of iterations required for the algorithm to learn the basic patterns in the dataset and for the anomaly score to 'settle down'. The default is based on empirical observations but in reality this could be larger for more complex domains. The downside if this is too large is that real anomalies might get ignored and not flagged.
estimationSamples
Number of reasonable anomaly scores required for the initial estimate of the Gaussian. The default of 100 records is reasonable - we just need sufficient samples to get a decent estimate for the Gaussian. It's unlikely you will need to tune this since the Gaussian is re-estimated every 10 iterations by default.
historicWindowSize
Size of sliding window of historical data points to maintain for periodic reestimation of the Gaussian. Note: the default of 8640 is based on a month's worth of history at 5-minute intervals.
reestimationPeriod
How often we re-estimate the Gaussian distribution. The ideal is to re-estimate every iteration but this is a performance hit. In general the system is not very sensitive to this number as long as it is small relative to the total number of records processed.
computeLogLikelihood
If TRUE
compute a log scale representation of the
likelihood value. Since thelikelihood computations return low probabilities that often go
into four 9's or five 9's, a log value is more useful for visualization, thresholding, etc.
computeScore()
Compute the probability that the current value plus anomaly score represents an anomaly given the historical distribution of anomaly scores. The closer the number is to 1, the higher the chance it is an anomaly.
AnomalyLikelihoodScorer$computeScore(x, value)
x
The current raw anomaly score.
value
The current ("raw") input value.
The anomalyLikelihood for this record.
S. Ahmad, A. Lavin, S. Purdy, Z. Agha, Unsupervised real-time anomaly detection for streaming data, Neurocomputing 262 (2017) 134-147.
a <- rnorm(500) scorer <- AnomalyLikelihoodScorer$new( learningPeriod = 10, estimationSamples = 10, historicWindowSize = 20, reestimationPeriod = 10 ) sapply(a, function(x) {scorer$computeScore(x, x)})
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.