auto_vif: Multicollinearity reduction via Variance Inflation Factor

View source: R/auto_vif.R

auto_vifR Documentation

Multicollinearity reduction via Variance Inflation Factor

Description

Filters predictors using sequential evaluation of variance inflation factors. Predictors are ranked by user preference (or column order) and evaluated sequentially. Each candidate is added to the selected pool only if the maximum VIF of all predictors (candidate plus already-selected) does not exceed the threshold.

Usage

auto_vif(x = NULL, preference.order = NULL, vif.threshold = 5, verbose = TRUE)

Arguments

x

Data frame with predictors, or a variable_selection object from auto_cor(). Default: NULL.

preference.order

Character vector specifying variable preference order. Does not need to include all variables in x. If NULL, column order is used. Default: NULL.

vif.threshold

Numeric (recommended: 2.5 to 10). Maximum allowed VIF among selected variables. Higher values allow more collinearity. Default: 5.

verbose

Logical. If TRUE, prints messages about operations and removed variables. Default: TRUE

Details

The algorithm follows these steps:

  1. Rank predictors by preference.order (or use column order if NULL).

  2. Initialize selection pool with first predictor.

  3. For each remaining candidate:

    • Compute VIF for candidate plus all selected predictors.

    • If max VIF equal or lower than vif.threshold, add candidate to selected pool.

    • Otherwise, skip candidate.

  4. Return selected predictors with their VIF values.

Data cleaning: Variables in preference.order not found in colnames(x) are silently removed. Non-numeric columns are removed with a warning. Rows with NA values are removed via na.omit(). Zero-variance columns trigger a warning but are not removed.

This function can be chained with auto_cor() through pipes (see examples).

Value

List with class variable_selection containing:

  • vif: Data frame with selected variable names and their VIF scores.

  • selected.variables: Character vector of selected variable names.

  • selected.variables.df: Data frame containing selected variables.

See Also

auto_cor()

Other preprocessing: auto_cor(), case_weights(), default_distance_thresholds(), double_center_distance_matrix(), is_binary(), make_spatial_fold(), make_spatial_folds(), the_feature_engineer(), weights_from_distance_matrix()

Examples

data(
  plants_df,
  plants_predictors
)

y <- auto_vif(
  x = plants_df[, plants_predictors]
)

y$selected.variables
y$vif
head(y$selected.variables.df)


spatialRF documentation built on Dec. 20, 2025, 1:07 a.m.