vif_filter | R Documentation |
This function iteratively filters layers from a SpatRaster
object by removing the one with the highest Variance Inflation Factor (VIF) that exceeds a specified threshold (th
).
vif_filter(x, th = 5)
x |
A |
th |
A |
This function implements a common iterative procedure to reduce multicollinearity among raster layers by removing variables with a high Variance Inflation Factor (VIF).
The VIF for a specific predictor indicates how much the variance of its estimated coefficient is inflated due to its linear relationships with all other predictors in the model. A high VIF value suggests a high degree of collinearity with other predictors (values exceeding 5
or 10
are often considered problematic; see O'Brien, 2007; Legendre & Legendre, 2012).
The filtering process is fully automated and robust:
Validates the input and converts the SpatRaster
to a data.frame
for calculations.
In each step, the function attempts to calculate VIF efficiently using matrix inversion. If perfect collinearity is detected (resulting in a singular matrix that cannot be inverted), the function automatically switches to a more robust method based on linear regressions to handle the situation without an error.
The function identifies the variable with the highest VIF among the remaining variables. If its VIF is greater than the threshold (th
), that variable is removed. The process repeats until all remaining variables are below the threshold or until only one variable remains.
The output is a list
containing two main components:
SpatRaster
object with the variables that were retained after the filtering process.
A list with a detailed summary of the process, including the names of the kept and excluded variables, the original Pearson's correlation matrix, and the final VIF values for the retained variables.
The internal VIF calculation includes checks to handle potential numerical instability, such as columns with zero or near-zero variance and cases of perfect collinearity among variables, which could otherwise lead to errors (e.g., infinite VIFs). Variables identified as having infinite VIF due to perfect collinearity are prioritized for removal.
References: O’Brien (2007) A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690. https://doi.org/10.1007/s11135-006-9018-6 Legendre & Legendre (2012) Interpretation of ecological structures. In P. Legendre & L. Legendre (Eds.), Developments in Environmental Modelling (Vol. 24, pp. 521-624). Elsevier. https://doi.org/10.1016/B978-0-444-53868-0.50010-1
A list
object containing the filtered SpatRaster
and a summary of the filtering process.
library(terra)
set.seed(2458)
n_cells <- 100 * 100
r_clim <- terra::rast(ncols = 100, nrows = 100, nlyrs = 7)
values(r_clim) <- c(
(rowFromCell(r_clim, 1:n_cells) * 0.2 + rnorm(n_cells, 0, 3)),
(rowFromCell(r_clim, 1:n_cells) * 0.9 + rnorm(n_cells, 0, 0.2)),
(colFromCell(r_clim, 1:n_cells) * 0.15 + rnorm(n_cells, 0, 2.5)),
(colFromCell(r_clim, 1:n_cells) +
(rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
(colFromCell(r_clim, 1:n_cells) /
(rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
(colFromCell(r_clim, 1:n_cells) *
(rowFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))),
(colFromCell(r_clim, 1:n_cells) *
(colFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))))
names(r_clim) <- c("varA", "varB", "varC", "varD", "varE", "varF", "varG")
terra::crs(r_clim) <- "EPSG:4326"
terra::plot(r_clim)
vif_result <- ClimaRep::vif_filter(r_clim, th = 5)
print(vif_result$summary)
r_clim_filtered <- vif_result$filtered_raster
terra::plot(r_clim_filtered)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.