extrapolation_check: A crude check for extrapolation

Description Usage Arguments Details Author(s) References See Also Examples

View source: R/extrapolation_check.R

Description

This function computes the Mahalanobis distance of points as a check for potential extrapolation.

Usage

1

Arguments

M

A fitted model that uses only quantitative variables

newdata

Data frame (that has the exact same columns as predictors used to fit the model M) whose Mahalanobis distances are to be calculated.

Details

This function computes the shape of the predictor data cloud and calculates the distances of points from the center (with respect to the shape of the data cloud). Extrapolation occurs at a combination of predictors that is far from combinations used to build the model. An observation with a large Mahalanobis distance MAY be far from the observations used to build the model and thus MAY require extrapolation.

Note: analysis assumes the predictor data cloud is roughly elliptical (this may not be a good assumptions).

The function reports the percentiles of the Mahalanobis distances of the points in newdata. Percentiles are the fraction of observations used in model that are CLOSER to the center than the point(s) in question. Large values of these percentages indicate a greater risk for extrapolation. If Percentile is about 99 you may be extrapolating.

The method is sensitive to outliers clusters of outliers and gives only a crude idea of the potential for extrapolation.

Author(s)

Adam Petrie

References

Introduction to Regression and Modeling

See Also

mahalanobis

Examples

1
2
3
4
5
  data(SALARY)
  M <- lm(Salary~Education*Experience+Months,data=SALARY)
  newdata <- data.frame(Education=c(0,5,10),Experience=c(15,15,15),Months=c(0,0,0))
  extrapolation_check(M,newdata) 
  #Individuals 1 and 3 are rather unusual (though not terribly) while individual 2 is typical.  

Example output

Loading required package: bestglm
Loading required package: leaps
Loading required package: VGAM
Loading required package: stats4
Loading required package: splines
Loading required package: rpart
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Important regclass change from 1.3:
All functions that had a . in the name now have an _
all.correlations -> all_correlations, cor.demo -> cor_demo, etc.

  Observation Percentile
1           1       94.6
2           2       74.2
3           3       97.8

regclass documentation built on March 26, 2020, 8:02 p.m.