| cor_select | R Documentation |
Wraps collinear_select() to automatize multicollinearity filtering via pairwise correlation in dataframes with numeric and categorical predictors.
The argument max_cor determines the maximum variance inflation factor allowed in the resulting selection of predictors.
The argument preference_order accepts a character vector of predictor names ranked from first to last index, or a dataframe resulting from preference_order(). When two predictors in this vector or dataframe are highly collinear, the one with a lower ranking is removed. This option helps protect predictors of interest. If not provided, predictors are ranked from lower to higher multicollinearity.
Please check the section Pairwise Correlation Filtering at the end of this help file for further details.
cor_select(
df = NULL,
response = NULL,
predictors = NULL,
preference_order = NULL,
max_cor = 0.7,
quiet = FALSE,
...
)
df |
(required; dataframe, tibble, or sf) A dataframe with responses
(optional) and predictors. Must have at least 10 rows for pairwise
correlation analysis, and |
response |
(optional; character or NULL) Name of one response variable in |
predictors |
(optional; character vector or NULL) Names of the
predictors in |
preference_order |
(optional; character vector, dataframe from
|
max_cor |
(optional; numeric or NULL) Maximum correlation allowed between pairs of |
quiet |
(optional; logical) If FALSE, messages are printed. Default: FALSE. |
... |
(optional) Internal args (e.g. |
character vector of selected predictors
cor_select computes the global correlation matrix, orders
predictors by preference_order or by lower-to-higher summed
correlations, and sequentially selects predictors with pairwise correlations
below max_cor.
Blas M. Benito, PhD
Other multicollinearity_filtering:
collinear(),
collinear_select(),
step_collinear(),
vif_select()
data(vi_smol)
## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
# future::multisession,
# workers = future::availableCores() - 1
# )
## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)
#predictors
predictors = c(
"koppen_zone", #character
"soil_type", #factor
"topo_elevation", #numeric
"soil_temperature_mean" #numeric
)
#predictors ordered from lower to higher multicollinearity
x <- cor_select(
df = vi_smol,
predictors = predictors,
max_cor = 0.7
)
x
#with custom preference order
x <- cor_select(
df = vi_smol,
predictors = predictors,
preference_order = c(
"koppen_zone",
"soil_type"
),
max_cor = 0.7
)
x
#with automated preference order
df_preference <- preference_order(
df = vi_smol,
response = "vi_numeric",
predictors = predictors
)
df_preference
x <- cor_select(
df = vi_smol,
predictors = predictors,
preference_order = df_preference,
max_cor = 0.7
)
x
#OPTIONAL: disable parallelization
#future::plan(future::sequential)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.