cor_df: Compute signed pairwise correlations dataframe

View source: R/cor_df.R

cor_dfR Documentation

Compute signed pairwise correlations dataframe

Description

Computes pairwise correlations between predictors using appropriate methods for different variable types:

  • Numeric vs. Numeric: Pearson correlation via stats::cor().

  • Numeric vs. Categorical: Target-encodes the categorical variable using the numeric variable as reference via target_encoding_lab() with leave-one-out method, then computes Pearson correlation.

  • Categorical vs. Categorical: Cramer's V via cor_cramer() as a measure of association. See cor_cramer() for important notes on mixing Pearson correlation and Cramer's V in multicollinearity analysis.

Parallelization via future::plan() and progress bars via progressr::handlers() are supported but only beneficial for large datasets with categorical predictors. Numeric-only correlations do not use parallelization or progress bars. Example: With 16 workers, 30k rows (dataframe vi), 49 numeric and 12 categorical predictors (see vi_predictors), parallelization achieves a 5.4x speedup (147s → 27s).

Usage

cor_df(df = NULL, predictors = NULL, quiet = FALSE, ...)

Arguments

df

(required; dataframe, tibble, or sf) A dataframe with responses (optional) and predictors. Must have at least 10 rows for pairwise correlation analysis, and 10 * (length(predictors) - 1) for VIF. Default: NULL.

predictors

(optional; character vector or NULL) Names of the predictors in df. If NULL, all columns except responses and constant/near-zero-variance columns are used. Default: NULL.

quiet

(optional; logical) If FALSE, messages are printed. Default: FALSE.

...

(optional) Internal args (e.g. function_name for validate_arg_function_name, a precomputed correlation matrix m, or cross-validation args for preference_order).

Value

dataframe with columns:

  • x: character, first predictor name.

  • y: character, second predictor name.

  • correlation: numeric, Pearson correlation (numeric vs. numeric and numeric vs. categorical) or Cramer's V (categorical vs. categorical).

See Also

Other multicollinearity_assessment: collinear_stats(), cor_clusters(), cor_cramer(), cor_matrix(), cor_stats(), vif(), vif_df(), vif_stats()

Examples

data(vi_smol)

## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
#   future::multisession,
#   workers = future::availableCores() - 1
# )

## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)

#predictors
predictors = c(
  "koppen_zone", #character
  "soil_type", #factor
  "topo_elevation", #numeric
  "soil_temperature_mean" #numeric
)

x <- cor_df(
  df = vi_smol,
  predictors = predictors
)

x

## OPTIONAL: disable parallelization
#future::plan(future::sequential)

collinear documentation built on Dec. 8, 2025, 5:06 p.m.