DetectAnomalies_RLM: Tiered Anomaly Detection using Robust Linear Regression

Description Usage Arguments Value See Also Examples

View source: R/DetectAnomalies_RLM.R

Description

A technique for detecting whether current observation in a multivariate time series is anomaly given all observations up to and including the current time.

The algorithm used in this function is as follows. Define a moving window of certain length. All the num.train points fall within this moving window are taken as training set. At each sample time, we first determine whether the response variable of the test point is above the sample quantile corresponds to the probability prob and calculated using the training set. Only those above the calculated sample quantile are considered as anomaly candidates.
For each of those anomaly candiates, construct a robust linear regression model (M estimation with inverse variance weighting) using predictors defined in the formula. We assume the normalized residual follows dist distribution. The test point is marked as anomaly if the p-value is smaller than the predifined threshold p. To minimize the effect of anomalies, those marked as anomalies are excluded from the sample quantile calculation and robust linear regression model fitting.
We believe that spikes that last for a short period of time are of less concern, therefore, the 1st-tier warnings are fired only when more than min.anom of past adjacent points are all anomalies. This method successfully decreases the false positive rate by suppressing the amount of warnings for short-time self-healing incident.
Two-tiered approach is used to differenciate incidents with different severity. When warning is triggered, the weighted average score of all previous adjacent anomalies is calculated according to (s[0] + 0.5*s[-1] + 0.5^2*s[-2] + ...)/(1 + 0.5 + 0.5^2 + ...). If the weighted average is larger than the predefined threshold min.score, 2nd-tier alert is fired.

Usage

1
2
3
DetectAnomalies_RLM(data, formula, num.train = 1008, prob = c(0.1, 0.9),
  direction = "both", min.anom = 3, min.score = 10, p = 0.01,
  dist = "normal", ...)

Arguments

data

data.frame from which variables specified in formula are preferentially to be taken.

formula

a robust linear regression formula of the form response ~ predictor1 + predictor2 + ....

num.train

an integer specifying the number of training points.

prob

probability threshold with values in [0, 1]. Each test point is compared with the sample quantile calculated using this probability. Depends on the direction, if

  • direction = "pos" prob should be a numeric value. Only those above the sample quantile are considered as anomaly candidates;

  • direction = "neg" prob should be a numeric value. Only those below the sample quantile are considered as anomaly candidates;

  • direction = "both" prob should be a numeric vector of two values in [0, 1]. Only those outside of the calculated quantile range are considered as anomaly candidates

.

direction

Directionality of the anomalies to be deteted. Options are: 'pos', 'neg' and 'both'. Defaults to be 'pos'.

min.anom

the minimum number of adjacent anomolies detected previously to fire the 1st-tier warning about current point.

min.score

the minimum weighted average score required to fire the 2nd-tier alert

p

p-value threshold with values in [0, 1].

dist

A string specifies the distribution to fit. Options are t-distribution(default: dist = "t"), normal distribution(dist = "normal"). If t-distribution is selected, remember to pass in parameter degrees of freedom, i.e. df

Value

The returned value is a list with the following five items:

See Also

rlm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 ProcessorTimeBN2 <- ProcessorTime$BN2
 QPS <- QueryPerSecond$BN2
 time <- QueryPerSecond$time
 Weekdays <- weekdays(time)
 Hours <- (lubridate::hour(time) * 60 + lubridate::minute(time))/20 # in 20-mins interval
 data <- data.frame(ProcessorTimeBN2, QPS, Weekdays, Hours)
 result <- DetectAnomalies_RLM(data, ProcessorTimeBN2 ~ QPS + Weekdays + Hours)
 dt <- data.table::data.table(ProcessorTimeBN2, time, residuals = result$residuals, mads = result$mads)
 p <- PlotTimeSeries(dt, "time")
 p <- AddShadedRegion(p, result$warning, "yellow")
 p <- AddShadedRegion(p, result$alert, "red")
 p
 result$hists[[21860]]

 result <- DetectAnomalies_RLM(data, ProcessorTimeBN2 ~ QPS + Weekdays + Hours, prob = 0.9, direction = "pos", p = 1e-5, dist = "t", df = 10)
 dt <- data.table::data.table(ProcessorTimeBN2, time, residuals = result$residuals, mads = result$mads)
 p <- PlotTimeSeries(dt, "time")
 p <- AddShadedRegion(p, result$warning, "yellow")
 p <- AddShadedRegion(p, result$alert, "red")
 p
 result$hists[[21860]]

jingjin1018/anetimeseries documentation built on May 19, 2019, 10:35 a.m.