vif_filter: Filter SpatRaster Layers based on Variance Inflation Factor...

View source: R/vif_filter.R

vif_filterR Documentation

Filter SpatRaster Layers based on Variance Inflation Factor (VIF)

Description

This function iteratively filters layers from a SpatRaster object by removing the one with the highest Variance Inflation Factor (VIF) that exceeds a specified threshold (th).

Usage

vif_filter(x, th = 5)

Arguments

x

A SpatRaster object containing the layers (variables) to filter. Must contain two or more layers.

th

A numeric value specifying the Variance Inflation Factor (VIF) threshold. Layers whose VIF exceeds this threshold are candidates for removal in each iteration (default: 5).

Details

This function implements a common iterative procedure to reduce multicollinearity among raster layers by removing variables with high Variance Inflation Factor (VIF). The VIF for a specific predictor indicates how much the variance of its estimated coefficient is inflated due to its linear relationships with all other predictors in the model. Conceptually, it is based on the proportion of variance that predictor shares with the other independent variables. A high VIF value suggests a high degree of collinearity with other predictors (values exceeding 5 or 10 are often considered problematic; see O'Brien, 2007). In this context, the function also provides the Pearson correlation matrix between all initial variables.

Key steps:

  1. Validate inputs: Ensures x is a SpatRaster with at least two layers and th is a valid numeric value.

  2. Convert the input SpatRaster (x) to a data.frame, retaining only unique rows if x has many cells and few unique climate values.

  3. Remove rows containing any NA values across all variables from the data.frame.

  4. In each iteration, calculate the VIF for all variables currently remaining in the dataset.

  5. Identify the variable with the highest VIF among the remaining variables.

  6. If this highest VIF value is greater than the threshold (th), remove the variable with the highest VIF from the dataset, and the loop continues with the remaining variables.

  7. This iterative process repeats until the highest VIF among the remaining variables is less than or equal to \le th, or until only one variable remains in the dataset.

The output of vif_filter returns a list object with a filtered SpatRaster object and a statistics summary.

The SpatRaster object containing only the variables that were kept and also provides a comprehensive summary printed to the console. The summary list including:

  • The original Pearson's correlation matrix between all initial variables.

  • The variables names that were kept and those that were excluded.

  • The final VIF values for the variables retained after the process.

The internal VIF calculation includes checks to handle potential numerical instability, such as columns with zero or near-zero variance and cases of perfect collinearity among variables, which could otherwise lead to errors (e.g., infinite VIFs or issues with matrix inversion). Variables identified as having infinite VIF due to perfect collinearity are prioritized for removal.

References: O’brien (2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41: 673–690. doi:10.1007/s11135-006-9018-6

Value

A SpatRaster object containing only the layers retained by the VIF filtering process.

Examples

library(terra)
library(sf)

set.seed(2458)
n_cells <- 100 * 100
r_clim <- terra::rast(ncols = 100, nrows = 100, nlyrs = 7)
values(r_clim) <- c(
   (rowFromCell(r_clim, 1:n_cells) * 0.2 + rnorm(n_cells, 0, 3)),
   (rowFromCell(r_clim, 1:n_cells) * 0.9 + rnorm(n_cells, 0, 0.2)),
   (colFromCell(r_clim, 1:n_cells) * 0.15 + rnorm(n_cells, 0, 2.5)),
   (colFromCell(r_clim, 1:n_cells) +
     (rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
   (colFromCell(r_clim, 1:n_cells) /
     (rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
   (colFromCell(r_clim, 1:n_cells) *
     (rowFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))),
   (colFromCell(r_clim, 1:n_cells) *
     (colFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))))
names(r_clim) <- c("varA", "varB", "varC", "varD", "varE", "varF", "varG")
terra::crs(r_clim) <- "EPSG:4326"
terra::plot(r_clim)

vif_result <- ClimaRep::vif_filter(r_clim, th = 5)
print(vif_result$summary)
r_clim_filtered <- vif_result$filtered_raster
terra::plot(r_clim_filtered)

ClimaRep documentation built on June 28, 2025, 1:07 a.m.