knitr::opts_chunk$set( collapse = TRUE, comment = "##", fig.path = "README-", fig.width = 8, fig.height = 6, fig.retina = 2 )
Anomaly Detection Using Seasonal Hybrid Extreme Studentized Deviate Test
A technique for detecting anomalies in seasonal univariate time series. The methods uses are robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. These methods can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an 'A/B' test, or for problems in econometrics, financial engineering, political and social sciences.
Twitterfolks launched this package in 2014. Many coding and package standards have changed. The package now conforms to CRAN standards.
The plots were nice and all but terribly unnecessary. The two core functions have been modified to only return tidy data frames (tibbles, actually). This makes it easier to chain them without having to deal with list element dereferencing.
Shorter, snake-case aliases have also been provided:
ad_ts
for AnomalyDetectionTs
ad_vec
for AnomalyDetectionVec
The original names are still in the package but the README
and examples
all use the newer, shorter versions.
The following outstanding PRs from the original repo are included:
If those authors find this repo, please add yourselves to the DESCRIPTION
as
contirbutors.
The following functions are implemented:
ad_ts
: Anomaly Detection Using Seasonal Hybrid ESD Testad_vec
: Anomaly Detection Using Seasonal Hybrid ESD TestThe underlying algorithm – referred to as Seasonal Hybrid ESD (S-H-ESD) builds upon the Generalized ESD test for detecting anomalies. Note that S-H-ESD can be used to detect both global as well as local anomalies. This is achieved by employing time series decomposition and using robust statistical metrics, viz., median together with ESD. In addition, for long time series (say, 6 months of minutely data), the algorithm employs piecewise approximation - this is rooted to the fact that trend extraction in the presence of anomalies in non-trivial - for anomaly detection.
Besides time series, the package can also be used to detect anomalies in a vector of numerical values. We have found this very useful as many times the corresponding timestamps are not available. The package provides rich visualization support. The user can specify the direction of anomalies, the window of interest (such as last day, last hour), enable/disable piecewise approximation; additionally, the x- and y-axis are annotated in a way to assist visual data analysis.
You can install AnomalyDetection from github with:
# install.packages("devtools") devtools::install_github("hrbrmstr/AnomalyDetection")
options(width=120)
library(AnomalyDetection) library(hrbrthemes) library(tidyverse)
data(raw_data) res <- ad_ts(raw_data, max_anoms=0.02, direction='both') glimpse(res) # for ggplot2 raw_data$timestamp <- as.POSIXct(raw_data$timestamp) ggplot() + geom_line( data=raw_data, aes(timestamp, count), size=0.125, color="lightslategray" ) + geom_point( data=res, aes(timestamp, anoms), color="#cb181d", alpha=1/3 ) + scale_x_datetime(date_labels="%b\n%Y") + scale_y_comma() + theme_ipsum_rc(grid="XY")
From the plot, we observe that the input time series experiences both positive
and negative anomalies. Furthermore, many of the anomalies in the time series
are local anomalies within the bounds of the time series’ seasonality (hence,
cannot be detected using the traditional approaches). The anomalies detected
using the proposed technique are annotated on the plot. In case the timestamps
for the plot above were not available, anomaly detection could then carried
out using the AnomalyDetectionVec function; specifically, one can use the
AnomalyDetectionVec()
method. The equivalent call to the above would be:
ad_vec(raw_data[,2], max_anoms=0.02, period=1440, direction='both')
Often, anomaly detection is carried out on a periodic basis. For instance, at times, one may be interested in determining whether there was any anomaly yesterday. To this end, we support a flag only_last whereby one can subset the anomalies that occurred during the last day or last hour.
data(raw_data) res <- ad_ts(raw_data, max_anoms=0.02, direction='both', only_last="day") glimpse(res) # for ggplot2 raw_data$timestamp <- as.POSIXct(raw_data$timestamp) ggplot() + geom_line( data=raw_data, aes(timestamp, count), size=0.125, color="lightslategray" ) + geom_point( data=res, aes(timestamp, anoms), color="#cb181d", alpha=1/3 ) + scale_x_datetime(date_labels="%b\n%Y") + scale_y_comma() + theme_ipsum_rc(grid="XY")
Anomaly detection for long duration time series can be carried out by setting
the longterm argument to TRUE
.
Copyright © 2015 Twitter, Inc. and other contributors
Licensed under the GPLv3
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.