identify_outliers: Identify Outliers
In RobbyLankford/tidytest: Tidy Statistical Modeling Tests

identify_outliers

R Documentation

Identify Outliers

Description

A data point flagged as an outlier means that is has an extreme value in its response (y) variable. If this is the case, the data point(s) is/are influential, meaning that it has an outsized influence on a regression.

Usage

identify_outliers(object, id = NULL, .cutoff = 3)

## S3 method for class 'lm'
identify_outliers(object, id = NULL, .cutoff = 3)

Arguments

`object`	A model object (such as a fitted `lm` object).
`id`	(Optional) A vector of values, the same length as the number of observations, used as an identifier for each data point. If left as NULL, the row number will be added as the ID column.
`.cutoff`	(Optional) Used to determine which standard residuals are indicative of an outlier. The default is the rule-of-thumb 3 (see details).

Details

Outliers are defined as those data points that have a standardized residual value greater than some cutoff value. A traditional rule-of-thumb is for that cutoff value to be three.

Value

A tibble.

References

Kutner, M., Nachtsheim, C., Neter, J. and Li, W. (2005). Applied Linear Statistical Models. ISBN: 0-07-238688-6. McGraw-Hill/Irwin.

Examples

library(tidytest)

#> `lm` Method
mod_lm_fit <- lm(mpg ~ disp + wt + hp, data = mtcars)

identify_outliers(mod_lm_fit)
identify_outliers(mod_lm_fit, id = rownames(mtcars))

RobbyLankford/tidytest documentation built on Jan. 27, 2024, 7:36 a.m.