influence_observation: Influence Observation

Description Usage Arguments Details Value References Examples

View source: R/outlier.R

Description

This function is a suite of functions that can be used to compute some of the regression (leve-one-out deletion) diagnostics for a linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982)

Usage

1

Arguments

X

Data.Frame

e

a single numeric vector of data values.

Details

Hat_Values / Lavarage Values are a measure of the effect of a particular observation on the regression predictions due to the position of that observation in the space of the inputs. In general, the farther a point is from the center of the input space, the more leverage it has. Because the sum of the leverage values is p, an observation i can be considered as an outlier if its leverage substantially exceeds the mean leverage value, p/n, for example, a value larger than 2*p/n.

Cooks distance is a measure of an observation or instances’ influence on a linear regression. Instances with a large influence may be outliers and datasets that have a large number of highly influential points might not be good predictors to fit linear models.

The Standardized residuals is the residual divided by its standard error. Standardization is a method of transforming data so that its mean value is zero and the standard deviation is one. If the distribution of residuals is approximately normal, then 95 standardized residuals should lie between -2 and +2, if many of the residuals lie outside + or - 2, then they might be considered unusual. However, about 5 outside this region.

The Studentized residuals take into account that the variance of the predicted value used in calculating residuals is not constant. The variability of cases close to the sample mean for an independent variable have smaller variance compared to cases further away from the mean. The studentized residual takes this change in variability into account by dividing the observed residual by an estimate of the standard deviation of the residual at that point. Norusis argues that this adjustment makes violation of regression assumptions more visible, so it is preferred to standardized residuals.

Value

a list with following values:

- Hat_Values / Lavarage Values (leverage.value)

- Cooks distance (cooks.distance)

- Standardized residuals(standardized.residuals)

- Studentized residuals (studentized.residuals)

References

Cook, R. D. and Weisberg, S. (1984) Residuals and Influence in Regression.

Wiley. Fox, J. (1997) Applied Regression, Linear Models, and Related Methods.

Sage. Williams, D. A. (1987) Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics 36, 181–191.

Stevenson, Wiliam B. (2008) Analyzing Residuals.

Examples

1
2
3
4
5
6
## Not run: 
X <- data.frame(matrix(rnorm(1000), nrow = 100))
resid <- rnorm(100)
olsdiagnosticR:::influence_observation(X = X, e = error)

## End(Not run)

Kale-S/isnormalr documentation built on Sept. 23, 2019, 5:48 a.m.