Identify multicollinearity issues by correlation, VIF, and visualizations. This package is designed for beginners of R who want to identify multicollinearity issues by applying a simple function. It automates the process of building a longer form of correlation matrix, creating correlation heat map and identifying pairwise highly correlated variables. A python version of package is also in the progress of development.
The following four functions are in the collinearityR package:
corr_matrix: A function that returns a generic and the longer form of
a correlation matrix for all numerical variables in a data frame.
corr_heatmap: A function that returns a correlation heatmap given a
dataframe.
vif_bar_plot: A function that returns a list containing a
data frame for Variable Inflation Factors (VIF) and a bar chart of the
VIFs for each explanatory variable in a multiple linear regression
model.
col_identify: A function that identifies multicollinearity
based on highly correlated pairs (using Pearson coefficient) with VIF
values exceeding the threshold.
The R ecosystem contains many tools necessary to conduct linear regression. However, it does not have tools to analyze multicollinearity visually using both Pearson’s coefficient and VIF. This process also requires intermediate knowledge of R to manipulate the correlation matrix into a more suitable format. Our package will allow users with less experience to conduct this analysis.
cor(): This function is part of base r. It creates a correlation
matrix between variables using Pearson’s coefficient. Documentation for
cor()
can be accessed here.
ggplot: This is one of the most commonly used plotting packages. The
collinearityR package relies on ggplot to create heatmap plots.
car: The car package is necessary to do VIF calculations. More
documentation on VIF function can be found
here.
And the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("UBC-MDS/collinearityR_tool")
This is a basic example which shows you how to apply this package to a data frame.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.