# vif: Variance Inflation Factor and test for multicollinearity In usdm: Uncertainty Analysis for Species Distribution Models

## Description

Calculates variance inflation factor (VIF) for a set of variables and exclude the highly correlated variables from the set through a stepwise procedure. This method can be used to deal with multicollinearity problems when you fit statistical models

## Usage

 ```1 2 3``` ```vif(x, ...) vifcor(x,th=0.9, ...) vifstep(x,th=10, ...) ```

## Arguments

 `x` explanatory variables (predictors), defined as a raster object (`RasterStack` or `RasterBrick`), or as a `matrix`, or as a `data.frame`. `th` a number specifying the correlation threshold for vifcor and VIF threshold for vifstep (see details). `...` additional arguments. see details.

## Details

VIF can be used to detect collinearity (Strong correlation between two or more predictor variables). Collinearity causes instability in parameter estimation in regression-type models. The VIF is based on the square of the multiple correlation coefficient resulting from regressing a predictor variable against all other predictor variables. If a variable has a strong linear relationship with at least one other variables, the correlation coefficient would be close to 1, and VIF for that variable would be large. A VIF greater than 10 is a signal that the model has a collinearity problem. `vif` function calculates this statistic for all variables in `x`. `vifcor` and `vifstep` uses two different strategy to exclude highly collinear variable through a stepwise procedure. `vifcor`, first find a pair of variables which has the maximum linear correlation (greater than th), and exclude one of them which has greater VIF. The procedure is repeated untill no variable with a high corrrelation coefficient (grater than threshold) with other variables remains. `vifstep` calculate VIF for all variables, exclude one with highest VIF (greater than threshold), repeat the procedure untill no variables with VIF greater than `th` remains.

`maxobservations` a number (default=5000) specifying the maximum number of observations should be contributed in calculation of VIF. When the number of observations (cells in raster or rows in data.frame/matrix) is greater than `maxobservations`, then a random sample with a size of `maxobservations` is drawn to keep the calculation effecient.

## Value

an object of class `VIF`

## Author(s)

Babak Naimi [email protected]

## References

Chatterjee, S. and Hadi, A. S. 2006. Regression analysis by example. John Wiley and Sons.;

Dormann, C. F. et al. 2012. Collinearity: A review of methods to Deal with it and a simulation study evaluating their performance. Ecography 35: 001-020.;

————–

IF you used this method, please cite the following article for which this package is developed:

Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.

`exclude`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```## Not run: file <- system.file("external/spain.grd", package="usdm") r <- brick(file) # reading a RasterBrick object including 10 raster layers in Spain r vif(r) # calculates vif for the variables in r v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded v1 v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded v2 ## End(Not run) ```

### Example output

```Loading required package: sp
class       : RasterBrick
dimensions  : 30, 29, 870, 10  (nrow, ncol, ncell, nlayers)
resolution  : 10000, 10000  (x, y)
extent      : 319375, 609375, 4449936, 4749936  (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : /usr/local/lib/R/site-library/usdm/external/spain.grd
names       :       Bio1,       Bio2,       Bio3,       Bio4,       Bio5,       Bio6,       Bio7,       Bio8,       Bio9,      Bio10
min values  :   65.40278,   83.90278,   34.09028, 4884.11816,  228.18750,  -47.90972,  221.13889,   36.33333,   31.68056,  144.34723
max values  :  145.16667,  120.17361,   39.94444, 6740.22900,  320.09723,   21.56944,  310.95834,  156.18750,  234.34723,  234.34723

Variables          VIF
1       Bio1 7.767314e+02
2       Bio2 2.458951e+02
3       Bio3 5.511014e+01
4       Bio4 1.759985e+02
5       Bio5 2.558863e+12
6       Bio6 1.381049e+12
7       Bio7 2.316071e+12
8       Bio8 1.581807e+00
9       Bio9 3.009865e+00
10     Bio10 1.520138e+03
2 variables from the 10 input variables have collinearity problem:

Bio5 Bio10

After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( Bio2 ~ Bio1 ):  0.03838531
max correlation ( Bio7 ~ Bio4 ):  0.8909937

---------- VIFs of the remained variables --------
Variables        VIF
1      Bio1  46.440583
2      Bio2 236.664027
3      Bio3  54.930047
4      Bio4  13.868554
5      Bio6  58.667824
6      Bio7 316.648968
7      Bio8   1.472454
8      Bio9   3.002529
5 variables from the 10 input variables have collinearity problem:

Bio5 Bio10 Bio7 Bio6 Bio4

After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( Bio2 ~ Bio1 ):  0.03838531
max correlation ( Bio9 ~ Bio1 ):  0.7101681

---------- VIFs of the remained variables --------
Variables      VIF
1      Bio1 2.086186
2      Bio2 1.370264
3      Bio3 1.253408
4      Bio8 1.267217
5      Bio9 2.309479
```

usdm documentation built on June 26, 2017, 3 a.m.