corr_var: Correlation between variable and dataframe
In laresbernardo/lares: Lean Analytics and Robust Exploration Sidekick

corr_var

R Documentation

Correlation between variable and dataframe

Description

This function correlates a whole dataframe with a single feature. It automatically runs ohse (one-hot-smart-encoding) so no need to input only numerical values.

Usage

corr_var(
  df,
  var,
  ignore = NULL,
  trim = 0,
  clean = FALSE,
  plot = TRUE,
  top = NA,
  ceiling = 1,
  max_pvalue = 1,
  limit = 10,
  ranks = FALSE,
  zeroes = FALSE,
  save = FALSE,
  quiet = FALSE,
  ...
)

## S3 method for class 'corr_var'
plot(x, var, max_pvalue = 1, top = NA, limit = NULL, ...)

Arguments

`df`	Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered.
`var`	Variable. Name of the variable to correlate. Note that if the variable `var` is not numerical, 1. you may define which category to select from using 'var_category'; 2. You may have to add `redundant = TRUE` to enable all categories (instead of `n-1`).
`ignore`	Character vector. Which columns do you wish to exclude?
`trim`	Integer. Trim words until the nth character for categorical values (applies for both, target and values)
`clean`	Boolean. Use lares::cleanText for categorical values (applies for both, target and values)
`plot`	Boolean. Do you wish to plot the result? If set to TRUE, the function will return only the plot and not the result's data
`top`	Integer. If you want to plot the top correlations, define how many
`ceiling`	Numeric. Remove all correlations above... Range: (0-1]
`max_pvalue`	Numeric. Filter non-significant variables. Range (0, 1]
`limit`	Integer. Limit one hot encoding to the n most frequent values of each column. Set to `NA` to ignore argument.
`ranks`	Boolean. Add ranking numbers?
`zeroes`	Do you wish to keep zeroes in correlations too?
`save`	Boolean. Save output plot into working directory
`quiet`	Boolean. Keep quiet? If not, informative messages will be shown.
`...`	Additional parameters passed to `corr` and `cor.test`
`x`	corr_var object

Value

data.frame. With variables, correlation and p-value results for each feature, arranged by descending absolute correlation value.

Examples

Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

corr_var(dft, Survived, method = "spearman", plot = FALSE, top = 10)

# With plots, results are easier to compare:

# Correlate Survived with everything else and show only significant results
dft %>% corr_var(Survived_TRUE, max_pvalue = 0.05)

# Top 15 with less than 50% correlation and show ranks
dft %>% corr_var(Survived_TRUE, ceiling = .6, top = 15, ranks = TRUE)

laresbernardo/lares documentation built on July 4, 2025, 12:23 p.m.