identify_influential_obs: Identify Influential Observations (Using Cook's Distance)
In RobbyLankford/tidytest: Tidy Statistical Modeling Tests

identify_influential_obs

R Documentation

Identify Influential Observations (Using Cook's Distance)

Description

A data point flagged as an influential observation means that it strongly influences the fitted values of a regression, taking into account both the x and y values of the observation.

Usage

identify_influential_obs(object, id = NULL, .cutoff = 0.5)

## S3 method for class 'lm'
identify_influential_obs(object, id = NULL, .cutoff = 0.5)

Arguments

`object`	A model object (such as a fitted `lm` object).
`id`	(Optional) A vector of values, the same length as the number of observations, used as an identifier for each data point. If left as NULL, the row number will be added as the ID column.
`.cutoff`	(Optional) Used to determine which Cook's distances are indicative of an influential observation. The default is the rule-of-thumb 0.5 (see details).

Details

Cook's distance is often used to determine if observations are influential. This function first calculates Cook's distance for each observation and filters out only those that are above a certain cutoff. A traditional rule-of-thumb is for that cutoff value to be 0.5.

Value

A tibble.

References

Kutner, M., Nachtsheim, C., Neter, J. and Li, W. (2005). Applied Linear Statistical Models. ISBN: 0-07-238688-6. McGraw-Hill/Irwin.

Examples

library(tidytest)

#> `lm` Method
mod_lm_fit <- lm(mpg ~ disp + wt + hp, data = mtcars)

identify_influential_obs(mod_lm_fit)
identify_influential_obs(mod_lm_fit, id = rownames(mtcars))

RobbyLankford/tidytest documentation built on Jan. 27, 2024, 7:36 a.m.