Description Usage Arguments Value See Also Examples
View source: R/DetectAnomalies_RLM.R
A technique for detecting whether current observation in a
multivariate time series is anomaly given all observations up to and
including the current time.
The algorithm used in this function is
as follows. Define a moving window of certain length. All the
num.train
points fall within this moving window are taken as
training set. At each sample time, we first determine whether the response
variable of the test point is above the sample quantile corresponds to the
probability prob
and calculated using the training set. Only those
above the calculated sample quantile are considered as anomaly candidates.
For each of those anomaly candiates, construct a robust linear
regression model (M estimation with inverse variance weighting) using
predictors defined in the formula
. We assume the normalized residual
follows dist
distribution. The test point is marked as anomaly if
the p-value is smaller than the predifined threshold p
. To minimize
the effect of anomalies, those marked as anomalies are excluded from the
sample quantile calculation and robust linear regression model fitting.
We believe that spikes that last for a short period of time are of less
concern, therefore, the 1st-tier warnings are fired only when more than
min.anom
of past adjacent points are all anomalies. This method
successfully decreases the false positive rate by suppressing the amount of
warnings for short-time self-healing incident.
Two-tiered approach is
used to differenciate incidents with different severity. When warning is
triggered, the weighted average score of all previous adjacent anomalies is
calculated according to (s[0] + 0.5*s[-1] + 0.5^2*s[-2] + ...)/(1 +
0.5 + 0.5^2 + ...). If the weighted average is larger than the predefined
threshold min.score
, 2nd-tier alert is fired.
1 2 3 |
data |
data.frame from which variables specified in |
formula |
a robust linear regression formula of the form |
num.train |
an integer specifying the number of training points. |
prob |
probability threshold with values in [0, 1]. Each test
point is compared with the sample quantile calculated using this
probability. Depends on the
. |
direction |
Directionality of the anomalies to be deteted. Options are: 'pos', 'neg' and 'both'. Defaults to be 'pos'. |
min.anom |
the minimum number of adjacent anomolies detected previously to fire the 1st-tier warning about current point. |
min.score |
the minimum weighted average score required to fire the 2nd-tier alert |
p |
p-value threshold with values in [0, 1]. |
dist |
A string specifies the distribution to fit. Options are
t-distribution(default: |
The returned value is a list with the following five items:
alert
a logical vector specifying whether points at current
index should fire the 1st-tier alert;
warning
a logical vector
specifying whether points at current index should fire the 2nd-tier
warning;
residuals
a numeric vector specifying the residuals
predicted using robust linear regression models at each point;
mads
a numeric vector specifying the mean absolute deviation of the
training residuals at each point;
hists
a list of static
plots, each of which contains histogram of training points overlaid with
stats function used to fit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ProcessorTimeBN2 <- ProcessorTime$BN2
QPS <- QueryPerSecond$BN2
time <- QueryPerSecond$time
Weekdays <- weekdays(time)
Hours <- (lubridate::hour(time) * 60 + lubridate::minute(time))/20 # in 20-mins interval
data <- data.frame(ProcessorTimeBN2, QPS, Weekdays, Hours)
result <- DetectAnomalies_RLM(data, ProcessorTimeBN2 ~ QPS + Weekdays + Hours)
dt <- data.table::data.table(ProcessorTimeBN2, time, residuals = result$residuals, mads = result$mads)
p <- PlotTimeSeries(dt, "time")
p <- AddShadedRegion(p, result$warning, "yellow")
p <- AddShadedRegion(p, result$alert, "red")
p
result$hists[[21860]]
result <- DetectAnomalies_RLM(data, ProcessorTimeBN2 ~ QPS + Weekdays + Hours, prob = 0.9, direction = "pos", p = 1e-5, dist = "t", df = 10)
dt <- data.table::data.table(ProcessorTimeBN2, time, residuals = result$residuals, mads = result$mads)
p <- PlotTimeSeries(dt, "time")
p <- AddShadedRegion(p, result$warning, "yellow")
p <- AddShadedRegion(p, result$alert, "red")
p
result$hists[[21860]]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.