envCorrAnalysis | R Documentation |
Explore correlation relationships between environmental predictor values at occurrence locations to aid in managing multicollinearity
envCorrAnalysis(
taxon = "",
titleText = NULL,
envDataPath,
occData,
xVar = NULL,
yVar = NULL,
threshold = 0.7,
outFile = NULL,
outPath = NULL
)
taxon |
Character. The name of the taxon whose occurrence records are being analysed. |
titleText |
Character. A title to be used in graphical output. |
envDataPath |
Character. Path to the environmental data layers to be used in the analysis. |
occData |
Data.frame or matrix. At least two columns must be present to provide longitude/X and latitude/Y coordinates of occurrence locations. |
xVar |
Character. Name of a variable in occData which is interpreted as the x-coordinate. If NULL (default) a search is made for nearest match to 'longitude' or 'X'. |
yVar |
Character. Name of a variable in occData which is interpreted as the y-coordinate. If NULL (default) a search is made for nearest match to 'latitude' or 'Y'. |
threshold |
Numeric. A correlation value (ie between 0 and 1) used to determine which variables in envData will be recommended for removal. Correlations greater than or equal to threshold will be listed. |
outFile |
Character. A non-NULL value is used as a file name to save the graphical output as a PNG file. By default, the output is plotted to the default graphics device. |
outPath |
Character. Path used by ggsave in combination with outFile to save the plot. |
Multicollinearity (high correlation between predictor variables or covariates) is a major issue for correlative models such as ecological niche models (ENMs). From the earliest days of modern statistical analysis, multicollinearity has been a major concern for fitting liner models such as ANOVAs and linear regressions since very high correlations cause complete numerical failure of the model fitting process. Machine learning methods such as MaxEnt are not likely to fail in the same (numerically spectacular) way, but are nevertheless prone to some adverse impacts caused by high levels of correlation between covariates.
Although it is still subject to research for machine learning methods such as MaxEnt, impacts of multicollinearity could include: high model complexity, instability in models (e.g. unstable indications of variable importance/contribution) possibly leading to incorrect inferences about variable/feature importance.
A character matrix listing the names of variables with absolute value of correlations greater than threshold which may be candidates for removal, and the number of threshold-exceeding correlations in which a listed variable has been found.
## Not run: #
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.