cor_select | R Documentation |
Implements a recursive forward selection algorithm to keep predictors with a maximum pairwise correlation with all other selected predictors lower than a given threshold. Uses cor_df()
underneath, and as such, can handle different combinations of predictor types.
Please check the section Pairwise Correlation Filtering at the end of this help file for further details.
cor_select(
df = NULL,
predictors = NULL,
preference_order = NULL,
max_cor = 0.75,
quiet = FALSE
)
df |
(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL. |
predictors |
(optional; character vector) Names of the predictors to select from |
preference_order |
(optional; string, character vector, output of
. Default: "auto" |
max_cor |
(optional; numeric) Maximum correlation allowed between any pair of variables in |
quiet |
(optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE |
character vector if response
is NULL or is a string.
named list if response
is a character vector.
The function cor_select()
applies a recursive forward selection algorithm to keep predictors with a maximum Pearson correlation with all other selected predictors lower than max_cor
.
If the argument preference_order
is NULL, the predictors are ranked from lower to higher sum of absolute pairwise correlation with all other predictors.
If preference_order
is defined, whenever two or more variables are above max_cor
, the one higher in preference_order
is preserved. For example, for the predictors and preference order a
and b
, if their correlation is higher than max_cor
, then b
will be removed and a
preserved. If their correlation is lower than max_cor
, then both are preserved.
Blas M. Benito, PhD
Other pairwise_correlation:
cor_clusters()
,
cor_cramer_v()
,
cor_df()
,
cor_matrix()
#subset to limit example run time
df <- vi[1:1000, ]
#only numeric predictors only to speed-up examples
#categorical predictors are supported, but result in a slower analysis
predictors <- vi_predictors_numeric[1:8]
#predictors has mixed types
sapply(
X = df[, predictors, drop = FALSE],
FUN = class
)
#parallelization setup
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableCores() - 1
)
#progress bar
# progressr::handlers(global = TRUE)
#without preference order
x <- cor_select(
df = df,
predictors = predictors,
max_cor = 0.75
)
#with custom preference order
x <- cor_select(
df = df,
predictors = predictors,
preference_order = c(
"swi_mean",
"soil_type"
),
max_cor = 0.75
)
#with automated preference order
df_preference <- preference_order(
df = df,
response = "vi_numeric",
predictors = predictors
)
x <- cor_select(
df = df,
predictors = predictors,
preference_order = df_preference,
max_cor = 0.75
)
#resetting to sequential processing
future::plan(future::sequential)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.