vif: Variance Inflation Factor and test for multicollinearity

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/vif.R

Description

Calculates variance inflation factor (VIF) for a set of variables and exclude the highly correlated variables from the set through a stepwise procedure. This method can be used to deal with multicollinearity problems when you fit statistical models

Usage

1
2
3
vif(x, ...)
vifcor(x,th=0.9, ...)
vifstep(x,th=10, ...)

Arguments

x

explanatory variables (predictors), defined as a raster object (RasterStack or RasterBrick), or as a matrix, or as a data.frame.

th

a number specifying the correlation threshold for vifcor and VIF threshold for vifstep (see details).

...

additional arguments. see details.

Details

VIF can be used to detect collinearity (Strong correlation between two or more predictor variables). Collinearity causes instability in parameter estimation in regression-type models. The VIF is based on the square of the multiple correlation coefficient resulting from regressing a predictor variable against all other predictor variables. If a variable has a strong linear relationship with at least one other variables, the correlation coefficient would be close to 1, and VIF for that variable would be large. A VIF greater than 10 is a signal that the model has a collinearity problem. vif function calculates this statistic for all variables in x. vifcor and vifstep uses two different strategy to exclude highly collinear variable through a stepwise procedure. vifcor, first find a pair of variables which has the maximum linear correlation (greater than th), and exclude one of them which has greater VIF. The procedure is repeated untill no variable with a high corrrelation coefficient (grater than threshold) with other variables remains. vifstep calculate VIF for all variables, exclude one with highest VIF (greater than threshold), repeat the procedure untill no variables with VIF greater than th remains.

addtional arguments:

maxobservations a number (default=5000) specifying the maximum number of observations should be contributed in calculation of VIF. When the number of observations (cells in raster or rows in data.frame/matrix) is greater than maxobservations, then a random sample with a size of maxobservations is drawn to keep the calculation effecient.

Value

an object of class VIF

Author(s)

Babak Naimi [email protected]

http://r-gis.net

References

Chatterjee, S. and Hadi, A. S. 2006. Regression analysis by example. John Wiley and Sons.;

Dormann, C. F. et al. 2012. Collinearity: A review of methods to Deal with it and a simulation study evaluating their performance. Ecography 35: 001-020.;

————–

IF you used this method, please cite the following article for which this package is developed:

Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.

See Also

exclude

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
file <- system.file("external/spain.grd", package="usdm")

r <- brick(file) # reading a RasterBrick object including 10 raster layers in Spain

r 

vif(r) # calculates vif for the variables in r

v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded

v1

v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded

v2

## End(Not run)

Example output

Loading required package: sp
Loading required package: raster
class       : RasterBrick 
dimensions  : 30, 29, 870, 10  (nrow, ncol, ncell, nlayers)
resolution  : 10000, 10000  (x, y)
extent      : 319375, 609375, 4449936, 4749936  (xmin, xmax, ymin, ymax)
coord. ref. : NA 
data source : /usr/local/lib/R/site-library/usdm/external/spain.grd 
names       :       Bio1,       Bio2,       Bio3,       Bio4,       Bio5,       Bio6,       Bio7,       Bio8,       Bio9,      Bio10 
min values  :   65.40278,   83.90278,   34.09028, 4884.11816,  228.18750,  -47.90972,  221.13889,   36.33333,   31.68056,  144.34723 
max values  :  145.16667,  120.17361,   39.94444, 6740.22900,  320.09723,   21.56944,  310.95834,  156.18750,  234.34723,  234.34723 

   Variables          VIF
1       Bio1 7.767314e+02
2       Bio2 2.458951e+02
3       Bio3 5.511014e+01
4       Bio4 1.759985e+02
5       Bio5 2.558863e+12
6       Bio6 1.381049e+12
7       Bio7 2.316071e+12
8       Bio8 1.581807e+00
9       Bio9 3.009865e+00
10     Bio10 1.520138e+03
2 variables from the 10 input variables have collinearity problem: 
 
Bio5 Bio10 

After excluding the collinear variables, the linear correlation coefficients ranges between: 
min correlation ( Bio2 ~ Bio1 ):  0.03838531 
max correlation ( Bio7 ~ Bio4 ):  0.8909937 

---------- VIFs of the remained variables -------- 
  Variables        VIF
1      Bio1  46.440583
2      Bio2 236.664027
3      Bio3  54.930047
4      Bio4  13.868554
5      Bio6  58.667824
6      Bio7 316.648968
7      Bio8   1.472454
8      Bio9   3.002529
5 variables from the 10 input variables have collinearity problem: 
 
Bio5 Bio10 Bio7 Bio6 Bio4 

After excluding the collinear variables, the linear correlation coefficients ranges between: 
min correlation ( Bio2 ~ Bio1 ):  0.03838531 
max correlation ( Bio9 ~ Bio1 ):  0.7101681 

---------- VIFs of the remained variables -------- 
  Variables      VIF
1      Bio1 2.086186
2      Bio2 1.370264
3      Bio3 1.253408
4      Bio8 1.267217
5      Bio9 2.309479

usdm documentation built on June 26, 2017, 3 a.m.