cor_df: Pairwise Correlation Data Frame

View source: R/cor_df.R

cor_dfR Documentation

Pairwise Correlation Data Frame

Description

Computes a pairwise correlation data frame. Implements methods to compare different types of predictors:

  • numeric vs. numeric: as computed with stats::cor() using the methods "pearson" or "spearman", via cor_numeric_vs_numeric().

  • numeric vs. categorical: the function cor_numeric_vs_categorical() target-encodes the categorical variable using the numeric variable as reference with target_encoding_lab() and the method "loo" (leave-one-out), and then their correlation is computed with stats::cor().

  • categorical vs. categorical: the function cor_categorical_vs_categorical() computes Cramer's V (see cor_cramer_v()) as indicator of the association between character or factor variables. However, take in mind that Cramer's V is not directly comparable with R-squared, even when having the same range from zero to one. It is always recommended to target-encode categorical variables with target_encoding_lab() before the pairwise correlation analysis.

Accepts a parallelization setup via future::plan() and a progress bar via progressr::handlers() (see examples).

Usage

cor_df(df = NULL, predictors = NULL, quiet = FALSE)

cor_numeric_vs_numeric(df = NULL, predictors = NULL, quiet = FALSE)

cor_numeric_vs_categorical(df = NULL, predictors = NULL, quiet = FALSE)

cor_categorical_vs_categorical(df = NULL, predictors = NULL, quiet = FALSE)

Arguments

df

(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.

predictors

(optional; character vector) Names of the predictors to select from df. If omitted, all numeric columns in df are used instead. If argument response is not provided, non-numeric variables are ignored. Default: NULL

quiet

(optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE

Value

data frame; pairwise correlation

See Also

Other pairwise_correlation: cor_clusters(), cor_cramer_v(), cor_matrix(), cor_select()

Other pairwise_correlation: cor_clusters(), cor_cramer_v(), cor_matrix(), cor_select()

Other pairwise_correlation: cor_clusters(), cor_cramer_v(), cor_matrix(), cor_select()

Other pairwise_correlation: cor_clusters(), cor_cramer_v(), cor_matrix(), cor_select()

Examples

data(
  vi,
  vi_predictors
)

#reduce size of vi to speed-up example execution
vi <- vi[1:1000, ]

#mixed predictors
vi_predictors <- vi_predictors[1:10]

#parallelization setup
future::plan(
  future::multisession,
  workers = 2 #set to parallelly::availableCores() - 1
)

#progress bar
# progressr::handlers(global = TRUE)

#correlation data frame
df <- cor_df(
  df = vi,
  predictors = vi_predictors
)

df

#disable parallelization
future::plan(future::sequential)


collinear documentation built on April 12, 2025, 1:36 a.m.