outlier_tukey: Flag outliers based on tukey's rule

Description Usage Arguments Functions See Also Examples

View source: R/outliers.R

Description

Returns a logical vector where TRUE indicates outliers.

Usage

1
2
3
4
5
6
7
8
9
outlier_tukey(
  x,
  k = 1.5,
  ignore_lwr = FALSE,
  apply_log = FALSE,
  ignore_zero = FALSE
)

outlier_tukey_top(x, k = 1.5, apply_log = FALSE, ignore_zero = FALSE)

Arguments

x

input values to check

k

the iqr multiplier that determines the fence level. Increasing will make outlier identification less strict (& vice-versa)

ignore_lwr

If TRUE, don't use the lower fence for identifying outliers

apply_log

If TRUE, log transform input values prior to applying tukey's rule. Useful since distributions often have a log-normal shape (e.g., spending)

ignore_zero

If TRUE, will exclude zero values from IQR & flagging. Note that zeroes will automatically be ignored if apply_log = TRUE

Functions

See Also

Other functions for identifying outliers: outlier_mean_compare(), outlier_pct(), outlier_plot()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(dplyr)
data(svy)

# take a look at the days variable
outlier_plot(svy$act, days, act)
outlier_plot(svy$act, days, act, apply_log = TRUE)

activity <- group_by(svy$act, act) %>% mutate(
    is_outlier = outlier_tukey(days, ignore_zero = TRUE, apply_log = TRUE),
    # in case we want to topcode the outliers:
    topcode_value = outlier_tukey_top(days, apply_log = TRUE),
    days_cleaned = ifelse(is_outlier, NA, days)
) %>% ungroup()

# summarize
outlier_plot(activity, days, act, apply_log = TRUE, show_outliers = TRUE)
outlier_pct(activity, act)
outlier_mean_compare(activity, days, days_cleaned, act)

southwick-associates/sastats documentation built on March 27, 2020, 9:39 p.m.