pat_outliers: Detect and replace time series outliers

Description Usage Arguments Value Note Examples

View source: R/pat_outliers.R

Description

Outlier detection using a Median Average Deviation "Hampel" filter. This function applies a rolling Hampel filter to find those points that are very far out in the tails of the distribution of values within the window.

The thresholdMin level is similar to a sigma value for normally distributed data. The default threshold setting thresholdMin = 8 identifies points that are extremely unlikely to be part of a normal distribution and therefore very likely to be an outlier. By choosing a relatively large value for 'thresholdMin“ we make it less likely that we will generate false positives.

The default setting of the window size windowSize = 15 means that 15 samples from a single channel are used to determine the distribution of values for which a median is calculated. Each PurpleAir channel makes a measurement approximately every 120 seconds so the temporal window is 15 * 120 sec or approximately 30 minutes. This seems like a reasonable period of time over which to evaluate PM2.5 measurements.

Specifying replace = TRUE allows you to perform smoothing by replacing outliers with the window median value. Using this technique, you can create an highly smoothed, artificial dataset by setting thresholdMin = 1 or lower (but always above zero).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
pat_outliers(
  pat = NULL,
  windowSize = 15,
  thresholdMin = 8,
  replace = FALSE,
  showPlot = TRUE,
  data_shape = 18,
  data_size = 1,
  data_color = "black",
  data_alpha = 0.5,
  outlier_shape = 8,
  outlier_size = 1,
  outlier_color = "red",
  outlier_alpha = 1
)

Arguments

pat

PurpleAir Timeseries pat object.

windowSize

Integer window size for outlier detection.

thresholdMin

Threshold value for outlier detection.

replace

Logical specifying whether replace outliers with the window median value.

showPlot

Logical specifying whether to generate outlier detection plots.

data_shape

Symbol to use for data points.

data_size

Size of data points.

data_color

Color of data points.

data_alpha

Opacity of data points.

outlier_shape

Symbol to use for outlier points.

outlier_size

Size of outlier points.

outlier_color

Color of outlier points.

outlier_alpha

Opacity of outlier points.

Value

A pat object with outliers replaced by median values.

Note

Additional documentation on the algorithm is available in seismicRoll::findOutliers().

Examples

1
2
3
4
5
library(AirSensor) 

example_pat %>%
  pat_filterDate(20180801, 20180815) %>%
  pat_outliers(replace = TRUE, showPlot = TRUE)

AirSensor documentation built on March 13, 2021, 1:07 a.m.