remove_bycorrvif: Removing variables using ViF and correlation

Description Usage Arguments Details Examples

View source: R/avoid_multicollinearity.R

Description

Removing variables using ViF and correlation

Usage

1
remove_bycorrvif(fmla, data, corrthresh, vifthresh, centrescalemains = FALSE)

Arguments

fmla

A model formula, specifies a possible set of main effects

data

A data frame to extract a the main effects from

corrthresh

A threshold. The variable with the highest correlation, and appearing later in the model matrix, is removed until there are no pairwise correlations above corrthresh.

vifthresh

A threshold. The variable with the highest ViF is removed until no variables have ViF above vifthresh.

centrescalemains

If TRUE then prep.designmatprocess() and apply.designmatprocess() are used to centre and scale main effects (after any logarithms).

Details

The function first removes variables based on pairwise correlation, and then based on ViF. Variables are removed one at a time. First a variable is removed due to having high correlation, then pairwise correlation is recomputed. This is repeated until no pairwise correlations are above the threshold corrthresh. Then generalised Variance Inflation Factors (ViF) are computed using car::vif(). The variable with the highest ViF is removed and ViFs are recomputed. This is repeated until there are no ViFs higher than vifthresh.

Examples

1
2
3
4
5
6
indata <- readRDS("./private/data/clean/7_2_10_input_data.rds")
remove_bycorrvif("~ AnnMeanTemp + AnnPrec + MaxTWarmMonth + PrecWarmQ + 
                   MinTColdMonth + PrecColdQ + PrecSeasonality + longitude * latitude",
                 data = indata$insampledata$Xocc,
                 corrthresh = 0.9,
                 vifthresh = 30)

sustainablefarms/linking-data documentation built on Oct. 28, 2020, 2:41 a.m.