calc_collin_diag: Calculate Collinearity Diagnostics
In emilelatour/lamisc: Latour's Little Helpers

calc_collin_diag

R Documentation

Calculate Collinearity Diagnostics

Description

This function computes collinearity diagnostics, including variance inflation factors (VIF), tolerance, R-squared values, eigenvalues, condition indices, and more. It replicates functionality similar to what is described in the Stata collinearity diagnostics page.

Usage

calc_collin_diag(
  data,
  ...,
  method = "pearson",
  use = "complete.obs",
  method_for_eigen = "corr",
  show_inv_cor_mat = FALSE
)

Arguments

`data`	A data frame containing the variables to analyze.
`...`	Variables to include in the analysis, specified without quotes.
`method`	The method for calculating the correlation matrix. Default is `"pearson"`.
`use`	How to handle missing values when calculating correlations. Default is `"complete.obs"`.
`method_for_eigen`	Specifies the method for calculating eigenvalues and condition indices. Options are `"corr"` for the correlation matrix or `"sscp"` for the scaled sum of squares and cross-product matrix. Default is `"corr"`.
`show_inv_cor_mat`	Logical. If `TRUE`, includes the inverse correlation matrix in the output. Default is `FALSE`.

Value

A list with the following components:

`table`	A tibble with the collinearity diagnostics for each variable. Includes VIF, tolerance, R-squared, eigenvalues, and condition indices.
`summary`	A tibble summarizing the mean VIF, condition number, and determinant of the correlation matrix.
`inv_cor_mat`	The inverse correlation matrix, if `show_inv_cor_mat = TRUE`.

Examples

# Example data
library(dplyr)
# Examples from Phil Ender
# http://www.philender.com/courses/categorical/notes2/collin.html

hsbdemo <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")
dplyr::glimpse(hsbdemo)

calc_collin_diag(data = hsbdemo,
                 female,
                 schtyp,
                 read,
                 write,
                 math,
                 science,
                 socst,
                 method_for_eigen = "corr",
                 method  = "pearson")


set.seed(123) # Ensure reproducibility

n <- 100 # Number of rows

lahigh <- tibble(
  id = 1000 + seq_len(n),
  gender = sample(c("male", "female"), n, replace = TRUE),
  ethnic = sample(c("hispanic", "filipino", "afr-amer", "asian", "white"), n, replace = TRUE),
  school = sample(1:2, n, replace = TRUE),
  algebra = sample(0:4, n, replace = TRUE),
  math = sample(0:4, n, replace = TRUE),
  eng95 = sample(0:4, n, replace = TRUE),
  eng94 = sample(0:4, n, replace = TRUE),
  mathnce = runif(n, 1, 100), # Continuous values between 1 and 100
  langnce = runif(n, 1, 100),
  mathpr = sample(1:100, n, replace = TRUE), # Integer percentiles
  langpr = sample(1:100, n, replace = TRUE),
  biling = sample(0:3, n, replace = TRUE),
  engprof = sample(0:4, n, replace = TRUE),
  daysatt = sample(40:90, n, replace = TRUE),
  daysabs = sample(0:35, n, replace = TRUE)
)

dplyr::glimpse(lahigh)

calc_collin_diag(data = lahigh,
                 mathnce,
                 langnce,
                 mathpr,
                 langpr,
                 method_for_eigen = "corr",
                 method  = "pearson")

calc_collin_diag(data = lahigh,
                 mathnce,
                 langnce,
                 method_for_eigen = "corr",
                 method  = "pearson")

emilelatour/lamisc documentation built on July 4, 2025, 6:33 p.m.