detect_outlr | R Documentation |
Applies one or more outlier detection methods to a given signal variable, and optionally aggregates the outputs to create a consensus result. See the outliers vignette for examples.
detect_outlr_rm
detects outliers based on a distance from the
rolling median specified in terms of multiples of the rolling interquartile
range (IQR).
detect_outlr_stl
detects outliers based on a seasonal-trend
decomposition using LOESS (STL).
detect_outlr(
x = seq_along(y),
y,
methods = tibble::tibble(method = "rm", args = list(list()), abbr = "rm"),
combiner = c("median", "mean", "none")
)
detect_outlr_rm(
x = seq_along(y),
y,
n = 21,
log_transform = FALSE,
detect_negatives = FALSE,
detection_multiplier = 2,
min_radius = 0,
replacement_multiplier = 0
)
detect_outlr_stl(
x = seq_along(y),
y,
n_trend = 21,
n_seasonal = 21,
n_threshold = 21,
seasonal_period,
seasonal_as_residual = FALSE,
log_transform = FALSE,
detect_negatives = FALSE,
detection_multiplier = 2,
min_radius = 0,
replacement_multiplier = 0
)
x |
Design points corresponding to the signal values |
y |
Signal values. |
methods |
A tibble specifying the method(s) to use for outlier detection, with one row per method, and the following columns:
|
combiner |
String, one of "median", "mean", or "none", specifying how to
combine results from different outlier detection methods for the thresholds
determining whether a particular observation is classified as an outlier,
as well as a replacement value for any outliers. If "none", then no
summarized results are calculated. Note that if the number of |
n |
Number of time steps to use in the rolling window. Default is 21.
This value is centrally aligned. When |
log_transform |
Should a log transform be applied before running outlier
detection? Default is |
detect_negatives |
Should negative values automatically count as
outliers? Default is |
detection_multiplier |
Value determining how far the outlier detection thresholds are from the rolling median, which are calculated as (rolling median) +/- (detection multiplier) * (rolling IQR). Default is 2. |
min_radius |
Minimum distance between rolling median and threshold, on transformed scale. Default is 0. |
replacement_multiplier |
Value determining how far the replacement values are from the rolling median. The replacement is the original value if it is within the detection thresholds, or otherwise it is rounded to the nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default is 0. |
n_trend |
Number of time steps to use in the rolling window for trend. Default is 21. |
n_seasonal |
Number of time steps to use in the rolling window for
seasonality. Default is 21. Can also be the string "periodic". See
|
n_threshold |
Number of time steps to use in rolling window for the IQR outlier thresholds. |
seasonal_period |
Integer specifying period of "seasonality". For
example, for daily data, a period 7 means weekly seasonality. It must be
strictly larger than 1. Also impacts the size of the low-pass filter
window; see |
seasonal_as_residual |
Boolean specifying whether the seasonal(/weekly)
component should be treated as part of the residual component instead of as
part of the predictions. The default, FALSE, treats them as part of the
predictions, so large seasonal(/weekly) components will not lead to
flagging points as outliers. |
Each outlier detection method, one per row of the passed methods
tibble, is a function that must take as its first two arguments x
and
y
, and then any number of additional arguments. The function must return
a tibble with the number of rows equal to length(y)
, and with columns
lower
, upper
, and replacement
, representing lower and upper bounds
for what would be considered an outlier, and a posited replacement value,
respectively.
For convenience, the outlier detection method can be specified (in the
method
column of methods
) by a string "rm", shorthand for
detect_outlr_rm()
, which detects outliers via a rolling median; or by
"stl", shorthand for detect_outlr_stl()
, which detects outliers via an
STL decomposition.
The STL decomposition is computed using stats::stl()
. Once
computed, the outlier detection method is analogous to the rolling median
method in detect_outlr_rm()
, except with the fitted values and residuals
from the STL decomposition taking the place of the rolling median and
residuals to the rolling median, respectively.
The last set of arguments, log_transform
through replacement_multiplier
,
are exactly as in detect_outlr_rm()
.
An tibble with number of rows equal to length(y)
and columns
giving the outlier detection thresholds (lower
and upper
) and
replacement values from each detection method (replacement
).
detection_methods <- dplyr::bind_rows(
dplyr::tibble(
method = "rm",
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5
)),
abbr = "rm"
),
dplyr::tibble(
method = "stl",
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5,
seasonal_period = 7
)),
abbr = "stl_seasonal"
),
dplyr::tibble(
method = "stl",
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5,
seasonal_period = 7,
seasonal_as_residual = TRUE
)),
abbr = "stl_reseasonal"
)
)
x <- covid_incidence_outliers %>%
dplyr::select(geo_value, time_value, cases) %>%
as_epi_df() %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr(
x = time_value, y = cases,
methods = detection_methods,
combiner = "median"
)) %>%
unnest(outlier_info)
# Detect outliers based on a rolling median
covid_incidence_outliers %>%
dplyr::select(geo_value, time_value, cases) %>%
as_epi_df() %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr_rm(
x = time_value, y = cases
))
# Detects outliers based on a seasonal-trend decomposition using LOESS
covid_incidence_outliers %>%
dplyr::select(geo_value, time_value, cases) %>%
as_epi_df() %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr_stl(
x = time_value, y = cases,
seasonal_period = 7 # weekly seasonality for daily data
))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.