envCorrAnalysis: Exploratory Correlation Analysis of Environmental Data

View source: R/corrAnalysis.R

envCorrAnalysisR Documentation

Exploratory Correlation Analysis of Environmental Data

Description

Explore correlation relationships between environmental predictor values at occurrence locations to aid in managing multicollinearity

Usage

envCorrAnalysis(
  taxon = "",
  titleText = NULL,
  envDataPath,
  occData,
  xVar = NULL,
  yVar = NULL,
  threshold = 0.7,
  outFile = NULL,
  outPath = NULL
)

Arguments

taxon

Character. The name of the taxon whose occurrence records are being analysed.

titleText

Character. A title to be used in graphical output.

envDataPath

Character. Path to the environmental data layers to be used in the analysis.

occData

Data.frame or matrix. At least two columns must be present to provide longitude/X and latitude/Y coordinates of occurrence locations.

xVar

Character. Name of a variable in occData which is interpreted as the x-coordinate. If NULL (default) a search is made for nearest match to 'longitude' or 'X'.

yVar

Character. Name of a variable in occData which is interpreted as the y-coordinate. If NULL (default) a search is made for nearest match to 'latitude' or 'Y'.

threshold

Numeric. A correlation value (ie between 0 and 1) used to determine which variables in envData will be recommended for removal. Correlations greater than or equal to threshold will be listed.

outFile

Character. A non-NULL value is used as a file name to save the graphical output as a PNG file. By default, the output is plotted to the default graphics device.

outPath

Character. Path used by ggsave in combination with outFile to save the plot.

Details

Multicollinearity (high correlation between predictor variables or covariates) is a major issue for correlative models such as ecological niche models (ENMs). From the earliest days of modern statistical analysis, multicollinearity has been a major concern for fitting liner models such as ANOVAs and linear regressions since very high correlations cause complete numerical failure of the model fitting process. Machine learning methods such as MaxEnt are not likely to fail in the same (numerically spectacular) way, but are nevertheless prone to some adverse impacts caused by high levels of correlation between covariates.

Although it is still subject to research for machine learning methods such as MaxEnt, impacts of multicollinearity could include: high model complexity, instability in models (e.g. unstable indications of variable importance/contribution) possibly leading to incorrect inferences about variable/feature importance.

Value

A character matrix listing the names of variables with absolute value of correlations greater than threshold which may be candidates for removal, and the number of threshold-exceeding correlations in which a listed variable has been found.

Examples

## Not run: #

peterbat1/fitMaxnet documentation built on Sept. 17, 2024, 10:50 p.m.