| cor_clusters | R Documentation |
Hierarchical clustering of predictors from their correlation matrix. Computes the correlation matrix with cor_df() and cor_matrix(), transforms it to a distance matrix using stats::dist(), computes a clustering solution with stats::hclust(), and applies stats::cutree() to separate groups based on the value of the argument max_cor.
Returns a dataframe with predictor names and their clusters, and optionally, prints a dendrogram of the clustering solution.
Accepts a parallelization setup via future::plan() and a progress bar via progressr::handlers() (see examples).
cor_clusters(
df = NULL,
predictors = NULL,
max_cor = 0.7,
method = "complete",
quiet = FALSE,
...
)
df |
(required; dataframe, tibble, or sf) A dataframe with predictors or the output of |
predictors |
(optional; character vector or NULL) Names of the
predictors in |
max_cor |
(optional; numeric or NULL) Correlation value used to separate clustering groups. Valid values are between 0.01 and 0.99. Default: 0.7 |
method |
(optional, character string) Argument of |
quiet |
(optional; logical) If FALSE, messages are printed. Default: FALSE. |
... |
(optional) Internal args (e.g. |
list:
df: dataframe with predictor names and their cluster IDs.
hclust: clustering object
Other multicollinearity_assessment:
collinear_stats(),
cor_cramer(),
cor_df(),
cor_matrix(),
cor_stats(),
vif(),
vif_df(),
vif_stats()
data(vi_smol)
## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
# future::multisession,
# workers = future::availableCores() - 1
# )
## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)
#group predictors using max_cor as clustering threshold
clusters <- cor_clusters(
df = vi_smol,
predictors = c(
"koppen_zone", #character
"soil_type", #factor
"topo_elevation", #numeric
"soil_temperature_mean" #numeric
),
max_cor = 0.75
)
#clusters dataframe
clusters$df
##plot hclust object
# graphics::plot(clusters$hclust)
##plot max_cor threshold
# graphics::abline(
# h = 1 - 0.75,
# col = "red4",
# lty = 3,
# lwd = 2
# )
## OPTIONAL: disable parallelization
#future::plan(future::sequential)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.