DetectAnomalies_RLM: Tiered Anomaly Detection using Robust Linear Regression
In jingjin1018/anetimeseries: An Implementation of Functions Used For Time Series Analysis Within A&E

Description Usage Arguments Value See Also Examples

A technique for detecting whether current observation in a multivariate time series is anomaly given all observations up to and including the current time.

The algorithm used in this function is as follows. Define a moving window of certain length. All the num.train points fall within this moving window are taken as training set. At each sample time, we first determine whether the response variable of the test point is above the sample quantile corresponds to the probability prob and calculated using the training set. Only those above the calculated sample quantile are considered as anomaly candidates.
For each of those anomaly candiates, construct a robust linear regression model (M estimation with inverse variance weighting) using predictors defined in the formula. We assume the normalized residual follows dist distribution. The test point is marked as anomaly if the p-value is smaller than the predifined threshold p. To minimize the effect of anomalies, those marked as anomalies are excluded from the sample quantile calculation and robust linear regression model fitting.
We believe that spikes that last for a short period of time are of less concern, therefore, the 1st-tier warnings are fired only when more than min.anom of past adjacent points are all anomalies. This method successfully decreases the false positive rate by suppressing the amount of warnings for short-time self-healing incident.
Two-tiered approach is used to differenciate incidents with different severity. When warning is triggered, the weighted average score of all previous adjacent anomalies is calculated according to (s[0] + 0.5*s[-1] + 0.5^2*s[-2] + ...)/(1 + 0.5 + 0.5^2 + ...). If the weighted average is larger than the predefined threshold min.score, 2nd-tier alert is fired.

1
2
3

DetectAnomalies_RLM(data, formula, num.train = 1008, prob = c(0.1, 0.9),
  direction = "both", min.anom = 3, min.score = 10, p = 0.01,
  dist = "normal", ...)

`data`	data.frame from which variables specified in `formula` are preferentially to be taken.
`formula`	a robust linear regression formula of the form `response ~ predictor1 + predictor2 + ...`.
`num.train`	an integer specifying the number of training points.
`prob`	probability threshold with values in [0, 1]. Each test point is compared with the sample quantile calculated using this probability. Depends on the `direction`, if `direction = "pos"` `prob` should be a numeric value. Only those above the sample quantile are considered as anomaly candidates; `direction = "neg"` `prob` should be a numeric value. Only those below the sample quantile are considered as anomaly candidates; `direction = "both"` `prob` should be a numeric vector of two values in [0, 1]. Only those outside of the calculated quantile range are considered as anomaly candidates .
`direction`	Directionality of the anomalies to be deteted. Options are: 'pos', 'neg' and 'both'. Defaults to be 'pos'.
`min.anom`	the minimum number of adjacent anomolies detected previously to fire the 1st-tier warning about current point.
`min.score`	the minimum weighted average score required to fire the 2nd-tier alert
`p`	p-value threshold with values in [0, 1].
`dist`	A string specifies the distribution to fit. Options are t-distribution(default: `dist = "t"`), normal distribution(`dist = "normal"`). If t-distribution is selected, remember to pass in parameter `degrees of freedom, i.e. df`

The returned value is a list with the following five items:

alert a logical vector specifying whether points at current index should fire the 1st-tier alert;
warning a logical vector specifying whether points at current index should fire the 2nd-tier warning;
residuals a numeric vector specifying the residuals predicted using robust linear regression models at each point;
mads a numeric vector specifying the mean absolute deviation of the training residuals at each point;
hists a list of static plots, each of which contains histogram of training points overlaid with stats function used to fit

rlm

 ProcessorTimeBN2 <- ProcessorTime$BN2
 QPS <- QueryPerSecond$BN2
 time <- QueryPerSecond$time
 Weekdays <- weekdays(time)
 Hours <- (lubridate::hour(time) * 60 + lubridate::minute(time))/20 # in 20-mins interval
 data <- data.frame(ProcessorTimeBN2, QPS, Weekdays, Hours)
 result <- DetectAnomalies_RLM(data, ProcessorTimeBN2 ~ QPS + Weekdays + Hours)
 dt <- data.table::data.table(ProcessorTimeBN2, time, residuals = result$residuals, mads = result$mads)
 p <- PlotTimeSeries(dt, "time")
 p <- AddShadedRegion(p, result$warning, "yellow")
 p <- AddShadedRegion(p, result$alert, "red")
 p
 result$hists[[21860]]

 result <- DetectAnomalies_RLM(data, ProcessorTimeBN2 ~ QPS + Weekdays + Hours, prob = 0.9, direction = "pos", p = 1e-5, dist = "t", df = 10)
 dt <- data.table::data.table(ProcessorTimeBN2, time, residuals = result$residuals, mads = result$mads)
 p <- PlotTimeSeries(dt, "time")
 p <- AddShadedRegion(p, result$warning, "yellow")
 p <- AddShadedRegion(p, result$alert, "red")
 p
 result$hists[[21860]]

jingjin1018/anetimeseries documentation built on May 19, 2019, 10:35 a.m.

jingjin1018/anetimeseries index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jingjin1018/anetimeseries
An Implementation of Functions Used For Time Series Analysis Within A&E

DetectAnomalies_RLM: Tiered Anomaly Detection using Robust Linear Regression
In jingjin1018/anetimeseries: An Implementation of Functions Used For Time Series Analysis Within A&E

Description

Usage

Arguments

Value

See Also

Examples

Related to DetectAnomalies_RLM in jingjin1018/anetimeseries...

R Package Documentation

Browse R Packages

We want your feedback!

jingjin1018/anetimeseries An Implementation of Functions Used For Time Series Analysis Within A&E

DetectAnomalies_RLM: Tiered Anomaly Detection using Robust Linear Regression In jingjin1018/anetimeseries: An Implementation of Functions Used For Time Series Analysis Within A&E

Description

Usage

Arguments

Value

See Also

Examples

Related to DetectAnomalies_RLM in jingjin1018/anetimeseries...

R Package Documentation

Browse R Packages

We want your feedback!

jingjin1018/anetimeseries
An Implementation of Functions Used For Time Series Analysis Within A&E

DetectAnomalies_RLM: Tiered Anomaly Detection using Robust Linear Regression
In jingjin1018/anetimeseries: An Implementation of Functions Used For Time Series Analysis Within A&E