envCorrAnalysis_SWD: Exploratory Correlation Analysis of Environmental Data for...

View source: R/corrAnalysis.R

envCorrAnalysis_SWDR Documentation

Exploratory Correlation Analysis of Environmental Data for SWD-formatted data

Description

Explore correlation relationships between environmental predictor values at occurrence locations to aid in managing multicollinearity when data is availabel in an SWD-formatted file

Usage

envCorrAnalysis_SWD(
  taxon = "",
  titleText = NULL,
  swdData = NULL,
  threshold = 0.7,
  outFile = NULL,
  outPath = NULL
)

Arguments

taxon

Character. The name of the taxon whose occurrence records are being analysed.

titleText

Character. A title to be used in graphical output.

swdData

Data.frame. Environmental data at occurrence locations in an SWD-formatted data frame.

threshold

Numeric. A correlation value (ie between 0 and 1) used to determine which variables in envData will be recommended for removal. Correlations greater than or equal to threshold will be listed.

outFile

Character. A non-NULL value is used as a file name to save the graphical output as a PNG file. By default, the output is plotted to the default graphics device.

outPath

Character. Path used by ggsave in combination with outFile to save the plot.

Details

Multicollinearity (high correlation between predictor variables or covariates) is a major issue for correlative models such as ecological niche models (ENMs). From the earliest days of modern statistical analysis, multicollinearity has been a major concern for fitting liner models such as ANOVAs and linear regressions since very high correlations cause complete numerical failure of the model fitting process. Machine learning methods such as MaxEnt are not likely to fail in the same (numerically spectacular) way, but are nevertheless prone to some adverse impacts caused by high levels of correlation between covariates.

Impacts include: model complexity, instability, incorrect inferences about variable/feature importance

Value

A character matrix listing the names of variables with absolute value of correlations greater than threshold which may be candidates for removal, and the number of threshold-exceeding correlations in which a listed variable has been found.


peterbat1/fitMaxnet documentation built on Sept. 17, 2024, 10:50 p.m.