is_outlier: Identify Outliers

Description Usage Arguments Details Value References Examples

View source: R/is_outlier.R

Description

is_outlier flags which observations fall outside a valid range of values based on limits set by the user. Limits can be set in absolute terms (the units of measurement), median absolute deviations, standard deviations, or any combination of the above.

Usage

1
is_outlier(measure, abs_lim = NULL, mad_lim = NULL, sd_lim = NULL)

Arguments

measure

A numeric vector.

abs_lim

Two-item numeric vector c(lower, upper) specifying the absolute lower and upper limits, respectively, of the range of valid values in terms of measurement units. This range should be set such that values that fall outside of it are considered implausible or impossible.

mad_lim

Numeric value specifying the range of valid values in terms of median absolute deviations from the median.

sd_lim

Numeric value specifying the range of valid values in terms of standard deviations from the mean.

Details

If more than one type of limit is specified, is_outlier will first apply the absolute limits if given (so that values that are outright impossible do not factor into the determination of the deviation statistics), followed by the median-absolute-deviation (MAD) test and/or the standard-deviation test.

The mad_lim argument is evaluated using the double MAD, which provides for robust identification of outliers even when the underlying distribution is non-normal and/or asymmetric. See Peter Rosenmai's blog post for more information.

Value

A logical vector indicating TRUE if an observation is an outlier.

References

Rosenmai P. 2013. Using the median absolute deviation to find outliers.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Create example time series of 10 reaction times in ms with two trials that
# fall outside the bounds of validity:
set.seed(7)
rt <- c(0, 10000, rnorm(8, mean = 1000, sd = 250))
rt
# Check for trials that are less than 250 ms with no upper bound:
is_outlier(rt, abs_lim = c(250, Inf))

# Check for trials that are more than 3 standard deviations from the mean
is_outlier(rt, sd_lim = 3)

# Check for trials that are more than 3 median absolute deviations from the
# median
is_outlier(rt, mad_lim = 3)

# Check for trials that are less than 250 ms, more than 2500 ms, or more than
# 2.5 MADs from the median:
is_outlier(rt, abs_lim = c(250, 2500), mad_lim = 2.5)

jashu/itrak documentation built on May 9, 2020, 1:57 p.m.