colindiag: Collinearity Diagnostics

View source: R/colindiag.R

colindiagR Documentation

Collinearity Diagnostics

Description

[Stable]

Perform a (multi)collinearity diagnostic of a correlation matrix of predictor variables using several indicators, as shown by Olivoto et al. (2017).

Usage

colindiag(.data, ..., by = NULL, n = NULL)

Arguments

.data

The data to be analyzed. It must be a symmetric correlation matrix, or a data frame, possible with grouped data passed from dplyr::group_by().

...

Variables to use in the correlation. If ... is null then all the numeric variables from .data are used. It must be a single variable name or a comma-separated list of unquoted variables names.

by

One variable (factor) to compute the function by. It is a shortcut to dplyr::group_by(). To compute the statistics by more than one grouping variable use that function.

n

If a correlation matrix is provided, then n is the number of objects used to compute the correlation coefficients.

Value

If .data is a grouped data passed from dplyr::group_by() then the results will be returned into a list-column of data frames.

  • cormat A symmetric Pearson's coefficient correlation matrix between the variables

  • corlist A hypothesis testing for each of the correlation coefficients

  • evalevet The eigenvalues with associated eigenvectors of the correlation matrix

  • VIF The Variance Inflation Factors, being the diagonal elements of the inverse of the correlation matrix.

  • CN The Condition Number of the correlation matrix, given by the ratio between the largest and smallest eigenvalue.

  • det The determinant of the correlation matrix.

  • ncorhigh Number of correlation greather than |0.8|.

  • largest_corr The largest correlation (in absolute value) observed.

  • smallest_corr The smallest correlation (in absolute value) observed.

  • weight_var The variables with largest eigenvector (largest weight) in the eigenvalue of smallest value, sorted in decreasing order.

Author(s)

Tiago Olivoto tiagoolivoto@gmail.com

References

Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi: 10.2134/agronj2016.04.0196

Examples


# Using the correlation matrix
library(metan)

cor_iris <- cor(iris[,1:4])
n <- nrow(iris)

col_diag <- colindiag(cor_iris, n = n)


# Using a data frame
col_diag_gen <- data_ge2 %>%
                group_by(GEN) %>%
                colindiag()

# Diagnostic by levels of a factor
# For variables with "N" in variable name
col_diag_gen <- data_ge2 %>%
                group_by(GEN) %>%
                colindiag(contains("N"))


metan documentation built on March 7, 2023, 5:34 p.m.