View source: R/calc_collin_diag.R
calc_collin_diag | R Documentation |
This function computes collinearity diagnostics, including variance inflation factors (VIF), tolerance, R-squared values, eigenvalues, condition indices, and more. It replicates functionality similar to what is described in the Stata collinearity diagnostics page.
calc_collin_diag(
data,
...,
method = "pearson",
use = "complete.obs",
method_for_eigen = "corr",
show_inv_cor_mat = FALSE
)
data |
A data frame containing the variables to analyze. |
... |
Variables to include in the analysis, specified without quotes. |
method |
The method for calculating the correlation matrix. Default is |
use |
How to handle missing values when calculating correlations. Default is |
method_for_eigen |
Specifies the method for calculating eigenvalues and condition indices. Options are |
show_inv_cor_mat |
Logical. If |
A list with the following components:
table |
A tibble with the collinearity diagnostics for each variable. Includes VIF, tolerance, R-squared, eigenvalues, and condition indices. |
summary |
A tibble summarizing the mean VIF, condition number, and determinant of the correlation matrix. |
inv_cor_mat |
The inverse correlation matrix, if |
# Example data
library(dplyr)
# Examples from Phil Ender
# http://www.philender.com/courses/categorical/notes2/collin.html
hsbdemo <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")
dplyr::glimpse(hsbdemo)
calc_collin_diag(data = hsbdemo,
female,
schtyp,
read,
write,
math,
science,
socst,
method_for_eigen = "corr",
method = "pearson")
set.seed(123) # Ensure reproducibility
n <- 100 # Number of rows
lahigh <- tibble(
id = 1000 + seq_len(n),
gender = sample(c("male", "female"), n, replace = TRUE),
ethnic = sample(c("hispanic", "filipino", "afr-amer", "asian", "white"), n, replace = TRUE),
school = sample(1:2, n, replace = TRUE),
algebra = sample(0:4, n, replace = TRUE),
math = sample(0:4, n, replace = TRUE),
eng95 = sample(0:4, n, replace = TRUE),
eng94 = sample(0:4, n, replace = TRUE),
mathnce = runif(n, 1, 100), # Continuous values between 1 and 100
langnce = runif(n, 1, 100),
mathpr = sample(1:100, n, replace = TRUE), # Integer percentiles
langpr = sample(1:100, n, replace = TRUE),
biling = sample(0:3, n, replace = TRUE),
engprof = sample(0:4, n, replace = TRUE),
daysatt = sample(40:90, n, replace = TRUE),
daysabs = sample(0:35, n, replace = TRUE)
)
dplyr::glimpse(lahigh)
calc_collin_diag(data = lahigh,
mathnce,
langnce,
mathpr,
langpr,
method_for_eigen = "corr",
method = "pearson")
calc_collin_diag(data = lahigh,
mathnce,
langnce,
method_for_eigen = "corr",
method = "pearson")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.