autoVIF: Automatic variable selection with variance inflation factor...

Description Usage Arguments Value Author(s) References Examples

View source: R/autoVIF.R

Description

Selects variables within a dataframe that are not correlated with each other, or with linear combinations of other variables, by using the variance inflation factor (VIF) criteria implemented in the vif function (Heilberger and Holland 2004).

Usage

1
2
3
4
5
autoVIF(
  x,
  try.to.keep = NULL,
  verbose = TRUE
)

Arguments

x

A data frame with numeric columns.

try.to.keep

A character vector with the names of the variables the user would like to keep, in order of preference. If this argument is not NULL, the function first applies vif to the variables not in x that are not in try.to.keep, then to the variables in try.to.keep, and finally to the outcome of both vif analyses, while always trying to remove variables not in try.to.keep. It is recommended to use the variable order of the variable column from the output of biserialCorrelation.

verbose

Boolean, defaults to TRUE. Triggers messages describing what variables are being removed.

Value

A character vector with the names of the selected variables.

Author(s)

Blas Benito <blasbenito@gmail.com>. The function vif is authored by Richard M. Heiberger <rmh@temple.edu>.

References

Heiberger, Richard M. and Holland, Burt (2004). Statistical Analysis and Data Display: An Intermediate Course with Examples in S-Plus, R, and SAS. Springer Texts in Statistics. Springer. ISBN 0-387-40270-5.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Not run: 
data("europe2000")
df <- raster::as.data.frame(europe2000[[c("bio1", "bio5", "bio6", "bio11", "bio12")]])
selected.vars <- SDMworkshop::autoVIF(
 x = df,
 try.to.keep = c("bio5", "bio6", "bio1"),
 verbose = TRUE
)
selected.vars

#autoVIF can also take the output of SDMworkshop::biserialCorrelation
#as try.to.keep argument, as follows:
data(virtualSpeciesPB)

cPB <- SDMworkshop::biserialCorrelation(
x = virtualSpeciesPB,
presence.column = "presence",
variables = c("bio1", "bio5", "bio6")
)

#note that cPB$df$variable is ordered from
#higher to lower biserial correlation
#higher biserial correlation is linked
#to higher predictive importance
selected.vars <- SDMworkshop::autoVIF(
 x = df,
 try.to.keep = cPB$df$variable,
 verbose = TRUE
)
selected.vars


## End(Not run)

BlasBenito/SDMworkshop documentation built on March 4, 2020, 4:16 a.m.