vif_filter | R Documentation |
This function iteratively filters layers from a SpatRaster
object by removing the one with the highest Variance Inflation Factor (VIF) that exceeds a specified threshold (th
).
vif_filter(x, th = 5)
x |
A |
th |
A |
This function implements a common iterative procedure to reduce multicollinearity among raster layers by removing variables with high Variance Inflation Factor (VIF).
The VIF for a specific predictor indicates how much the variance of its estimated coefficient is inflated due to its linear relationships with all other predictors in the model.
Conceptually, it is based on the proportion of variance that predictor shares with the other independent variables.
A high VIF value suggests a high degree of collinearity with other predictors (values exceeding 5
or 10
are often considered problematic; see O'Brien, 2007).
In this context, the function also provides the Pearson correlation matrix between all initial variables.
Key steps:
Validate inputs: Ensures x
is a SpatRaster
with at least two layers and th
is a valid numeric
value.
Convert the input SpatRaster
(x
) to a data.frame
, retaining only unique rows if x
has many cells and few unique climate values.
Remove rows containing any NA
values across all variables from the data.frame
.
In each iteration, calculate the VIF for all variables currently remaining in the dataset.
Identify the variable with the highest VIF among the remaining variables.
If this highest VIF value is greater than the threshold (th
), remove the variable with the highest VIF from the dataset, and the loop continues with the remaining variables.
This iterative process repeats until the highest VIF among the remaining variables is less than or equal to \le
th
, or until only one variable remains in the dataset.
The output of vif_filter
returns a list
object with a filtered SpatRaster
object and a statistics summary.
The SpatRaster
object containing only the variables that were kept and also provides a comprehensive summary printed to the console.
The summary list including:
The original Pearson's correlation matrix between all initial variables.
The variables names that were kept and those that were excluded.
The final VIF values for the variables retained after the process.
The internal VIF calculation includes checks to handle potential numerical instability, such as columns with zero or near-zero variance and cases of perfect collinearity among variables, which could otherwise lead to errors (e.g., infinite VIFs or issues with matrix inversion). Variables identified as having infinite VIF due to perfect collinearity are prioritized for removal.
References: O’brien (2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41: 673–690. doi:10.1007/s11135-006-9018-6
A SpatRaster
object containing only the layers retained by the VIF filtering process.
library(terra)
library(sf)
set.seed(2458)
n_cells <- 100 * 100
r_clim <- terra::rast(ncols = 100, nrows = 100, nlyrs = 7)
values(r_clim) <- c(
(rowFromCell(r_clim, 1:n_cells) * 0.2 + rnorm(n_cells, 0, 3)),
(rowFromCell(r_clim, 1:n_cells) * 0.9 + rnorm(n_cells, 0, 0.2)),
(colFromCell(r_clim, 1:n_cells) * 0.15 + rnorm(n_cells, 0, 2.5)),
(colFromCell(r_clim, 1:n_cells) +
(rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
(colFromCell(r_clim, 1:n_cells) /
(rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
(colFromCell(r_clim, 1:n_cells) *
(rowFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))),
(colFromCell(r_clim, 1:n_cells) *
(colFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))))
names(r_clim) <- c("varA", "varB", "varC", "varD", "varE", "varF", "varG")
terra::crs(r_clim) <- "EPSG:4326"
terra::plot(r_clim)
vif_result <- ClimaRep::vif_filter(r_clim, th = 5)
print(vif_result$summary)
r_clim_filtered <- vif_result$filtered_raster
terra::plot(r_clim_filtered)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.