identify_outliers | R Documentation |
Detect outliers using boxplot methods. Boxplots are a popular and an easy method for identifying outliers. There are two categories of outlier: (1) outliers and (2) extreme points.
Values above Q3 + 1.5xIQR
or below Q1 - 1.5xIQR
are considered
as outliers. Values above Q3 + 3xIQR
or below Q1 - 3xIQR
are
considered as extreme points (or extreme outliers).
Q1 and Q3 are the first and third quartile, respectively. IQR is the interquartile range (IQR = Q3 - Q1).
Generally speaking, data points that are labelled outliers in boxplots are
not considered as troublesome as those considered extreme points and might
even be ignored. Note that, any NA
and NaN
are automatically removed
before the quantiles are computed.
identify_outliers(data, ..., variable = NULL) is_outlier(x, coef = 1.5) is_extreme(x)
data |
a data frame |
... |
One unquoted expressions (or variable name). Used to select a
variable of interest. Alternative to the argument |
variable |
variable name for detecting outliers |
x |
a numeric vector |
coef |
coefficient specifying how far the outlier should be from the edge of their box. Possible values are 1.5 (for outlier) and 3 (for extreme points only). Default is 1.5 |
identify_outliers()
. Returns the input data
frame with two additional columns: "is.outlier" and "is.extreme", which hold
logical values.
is_outlier() and is_extreme()
. Returns logical
vectors.
identify_outliers()
: takes a data frame and extract rows suspected as outliers
according to a numeric column. The following columns are added "is.outlier"
and "is.extreme".
is_outlier()
: detect outliers in a numeric vector. Returns logical vector.
is_extreme()
: detect extreme points in a numeric vector. An alias of
is_outlier()
, where coef = 3. Returns logical vector.
# Generate a demo data set.seed(123) demo.data <- data.frame( sample = 1:20, score = c(rnorm(19, mean = 5, sd = 2), 50), gender = rep(c("Male", "Female"), each = 10) ) # Identify outliers according to the variable score demo.data %>% identify_outliers(score) # Identify outliers by groups demo.data %>% group_by(gender) %>% identify_outliers("score")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.