| cor_df | R Documentation |
Computes pairwise correlations between predictors using appropriate methods for different variable types:
Numeric vs. Numeric: Pearson correlation via stats::cor().
Numeric vs. Categorical: Target-encodes the categorical variable using the numeric variable as reference via target_encoding_lab() with leave-one-out method, then computes Pearson correlation.
Categorical vs. Categorical: Cramer's V via cor_cramer() as a measure of association. See cor_cramer() for important notes on mixing Pearson correlation and Cramer's V in multicollinearity analysis.
Parallelization via future::plan() and progress bars via progressr::handlers() are supported but only beneficial for large datasets with categorical predictors. Numeric-only correlations do not use parallelization or progress bars. Example: With 16 workers, 30k rows (dataframe vi), 49 numeric and 12 categorical predictors (see vi_predictors), parallelization achieves a 5.4x speedup (147s → 27s).
cor_df(df = NULL, predictors = NULL, quiet = FALSE, ...)
df |
(required; dataframe, tibble, or sf) A dataframe with responses
(optional) and predictors. Must have at least 10 rows for pairwise
correlation analysis, and |
predictors |
(optional; character vector or NULL) Names of the
predictors in |
quiet |
(optional; logical) If FALSE, messages are printed. Default: FALSE. |
... |
(optional) Internal args (e.g. |
dataframe with columns:
x: character, first predictor name.
y: character, second predictor name.
correlation: numeric, Pearson correlation (numeric vs. numeric and numeric vs. categorical) or Cramer's V (categorical vs. categorical).
Other multicollinearity_assessment:
collinear_stats(),
cor_clusters(),
cor_cramer(),
cor_matrix(),
cor_stats(),
vif(),
vif_df(),
vif_stats()
data(vi_smol)
## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
# future::multisession,
# workers = future::availableCores() - 1
# )
## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)
#predictors
predictors = c(
"koppen_zone", #character
"soil_type", #factor
"topo_elevation", #numeric
"soil_temperature_mean" #numeric
)
x <- cor_df(
df = vi_smol,
predictors = predictors
)
x
## OPTIONAL: disable parallelization
#future::plan(future::sequential)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.