removeCollinearity: Remove collinearity among variables of a raster stack

View source: R/removeCollinearity.R

removeCollinearityR Documentation

Remove collinearity among variables of a raster stack

Description

This functions analyses the correlation among variables of the provided stack of environmental variables (using Pearson's R), and can return a vector containing names of variables that are not colinear, or a list containing grouping variables according to their degree of collinearity.

Usage

removeCollinearity(
  raster.stack,
  multicollinearity.cutoff = 0.7,
  select.variables = FALSE,
  sample.points = FALSE,
  nb.points = 10000,
  plot = FALSE,
  method = "pearson"
)

Arguments

raster.stack

a SpatRaster object, in which each layer represent an environmental variable.

multicollinearity.cutoff

a numeric value corresponding to the cutoff of correlation above which to group variables.

select.variables

TRUE or FALSE. If TRUE, then the function will choose one variable among each group to return a vector of non correlated variables (see details). If FALSE, the function will return a list containing the groups of correlated variables.

sample.points

TRUE or FALSE. If you have a large raster file then use this parameter to sample a number of points equal to nb.points.

nb.points

a numeric value. Only useful if sample.points = TRUE. The number of sampled points from the raster, to perform the PCA. A too small value may not be representative of the environmental conditions in your raster.

plot

TRUE or FALSE. If TRUE, the hierarchical ascendant classification used to group variables will be plotted.

method

"pearson", "spearman" or "kendall". The correlation method to be used. If your variables are skewed or have outliers (e.g. when working with precipitation variables) you should favour the Spearman or Kendall methods.

Details

This function uses the Pearson's correlation coefficient to analyse correlation among variables. This coefficient is then used to compute a distance matrix, which in turn is used it compute an ascendant hierarchical classification, with the 'complete' method (see hclust). If at least one correlation above the multicollinearity.cutoff is detected, then the variables will be grouped according to their degree of correlation.

If select.variables = TRUE, then the function will return a vector containing variables that are not colinear. The variables not correlated to any other variables are automatically included in this vector. For each group of colinear variables, one variable will be randomly chosen and included in this vector.

Value

a vector of non correlated variables, or a list where each element is a group of non correlated variables.

Author(s)

Boris Leroy leroy.boris@gmail.com

with help from C. N. Meynard, C. Bellard & F. Courchamp

Examples

# Create an example stack with six environmental variables
a <- matrix(rep(dnorm(1:100, 50, sd = 25)), 
            nrow = 100, ncol = 100, byrow = TRUE)
env <- c(rast(a * dnorm(1:100, 50, sd = 25)),
         rast(a * 1:100),
         rast(a * logisticFun(1:100, alpha = 10, beta = 70)),
         rast(t(a)),
         rast(exp(a)),
         rast(log(a)))
names(env) <- paste("Var", 1:6, sep = "")   
   
# Defaults settings: cutoff at 0.7
removeCollinearity(env, plot = TRUE)

# Changing cutoff to 0.5
removeCollinearity(env, plot = TRUE, multicollinearity.cutoff = 0.5)

# Automatic selection of variables not intercorrelated
removeCollinearity(env, plot = TRUE, select.variables = TRUE)

# Assuming a very large raster file: selecting a subset of points
removeCollinearity(env, plot = TRUE, select.variables = TRUE,
                   sample.points = TRUE, nb.points = 5000)



virtualspecies documentation built on Sept. 27, 2023, 1:06 a.m.