correlationDendrogram: Correlation dendrogram to help reduce multicollinearity in a...

Description Usage Arguments Value Author(s) Examples

View source: R/correlationDendrogram.R

Description

Computes the correlation between all pairs of variables in a training dataset. If a biserialCorrelation output is provided, it further selects variables automatically based on the R-squared value obtained by each variable in the biserial correlation analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
correlationDendrogram(
  x,
  variables = NULL,
  exclude.variables = c("x", "y", "presence"),
  correlation.threshold = 0.5,
  automatic.selection = TRUE,
  biserialCorrelation.output = NULL,
  plot = TRUE,
  label.size = 6
  )

Arguments

x

A data frame with a presence column with 1 indicating presence and 0 indicating background, and columns with predictor values.

variables

Character vector, names of the columns representing predictors. If NULL, all numeric variables but presence.column are considered.

exclude.variables

Character vector, variables to exclude from the analysis. Defaults to c("x", "y", "presence").

correlation.threshold

Numeric in the interval [0, 1], maximum Pearson correlation of the selected variables.

automatic.selection

Boolean. If TRUE, the function provides a vector of selected variables along with the dendrogram plot. Otherwise, only the dendrogram plot is returned.

biserialCorrelation.output

List, output of the function biserialCorrelation. Its R-squared scores are used to select variables.

plot

Boolean, prints biserial correlation plot if TRUE.

label.size

Numeric, size of the dendrogram labels.

Value

If automatic.selection = TRUE, a list with two slots named "dendrogram" (a ggplot2 object) and "selected.variables" with the dendrogram and the character vector with the selected variables. Otherwise, only returns the dendrogram.

Author(s)

Blas Benito <blasbenito@gmail.com>.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
data("virtualSpeciesPB")

bis.cor <- biserialCorrelation(
 x = virtualSpeciesPB,
 exclude.variables = c("x", "y")
)

selected.vars <- correlationDendrogram(
 x = virtualSpeciesPB,
 variables = NULL,
 exclude.variables = c("x", "y", "presence"),
 correlation.threshold = 0.5,
 automatic.selection = TRUE,
 biserialCorrelation.output = bis.cor
)$selected.variables

## End(Not run)

BlasBenito/SDMworkshop documentation built on March 4, 2020, 4:16 a.m.