View source: R/low_predictor_collinearity.R
low_predictor_collinearity | R Documentation |
Function provides a way to identify a set of predictors with pairwise low collinearity among themselves.
With the submission of a data frame of raw predictor values or correlation matrix among predictors, the function removes the minimum number of predictors to ensure that all correlations are below a certain threshold.
low_predictor_collinearity(df = NULL, cor = NULL, threshold = 0.75)
df |
An optional numeric data frame of predictor variables without |
cor |
An optional matrix of cross correlations among the predictor variables |
threshold |
A numeric that sets the minimum correlation between pairs of predictors to run through the algorithm. |
Function was inspired by "Applied Predictive Modeling", Kuhn, Johnson, Page 47.
Note that predictors of the data frame must all be numeric without NA values.
The function's algorithm follows the following steps:
1. Create a starting list of the all the candidate predictors.
2. Create a second list of pairs of predictors with correlations above a given threshold and order the correlations from high to low.
3. For each pair of predictors (call them A and B) in the ordered list, determine the average correlation between predictor A and the other predictors. Do the same for predictor B.
4. If A has a larger absolute average correlation, remove it from the ordered list and from the start list created in step 1; otherwise remove predictor B.
5. Repeat steps 3-4 through the entire ordered list of correlations defined in step 2, removing potential predictors from the starting list created in step 1.
6. The predictors left in the starting list are identified as having a low level of collinearity.
Returning a named list with:
"predictors" A character vector with the names of predictors with pairwise low collinearity among themselves.
"correlations" The correlation matrix with just the selected predictors.
"max_correlation" The maximum correlation among all pairs of the selected predictors.
library(data.table)
library(RregressPkg)
bloodpress_predictors_dt <- RregressPkg::bloodpress[, !c("BP")]
low_collinearity_lst <- RregressPkg::low_predictor_collinearity(
df = bloodpress_predictors_dt
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.