corr_cross: Ranked cross-correlation across all variables
In laresbernardo/lares: Analytics & Machine Learning Sidekick

corr_cross

R Documentation

Ranked cross-correlation across all variables

Description

This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.

Usage

corr_cross(
  df,
  plot = TRUE,
  pvalue = TRUE,
  max_pvalue = 1,
  type = 1,
  max = 1,
  top = 20,
  local = 1,
  ignore = NULL,
  contains = NA,
  grid = TRUE,
  rm.na = FALSE,
  quiet = FALSE,
  ...
)

Arguments

`df`	Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered.
`plot`	Boolean. Show and return a plot?
`pvalue`	Boolean. Returns a list, with correlations and statistical significance (p-value) for each value.
`max_pvalue`	Numeric. Filter non-significant variables. Range (0, 1]
`type`	Integer. Plot type. 1 is for overall rank. 2 is for local rank.
`max`	Numeric. Maximum correlation permitted (from 0 to 1)
`top`	Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations
`local`	Integer. Label top n local correlations. Only valid when type = 2
`ignore`	Vector or character. Which column should be ignored?
`contains`	Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used).
`grid`	Boolean. Separate into grids?
`rm.na`	Boolean. Remove NAs?
`quiet`	Boolean. Keep quiet? If not, show messages
`...`	Additional parameters passed to `corr`

Details

DataScience+ Post: Find Insights with Ranked Cross-Correlations

Value

Depending on input plot, we get correlation and p-value results for every combination of features, arranged by descending absolute correlation value, with a data.frame plot = FALSE or plot plot = TRUE.

Examples

Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)

# Show only most relevant results filtered by pvalue
corr_cross(dft, rm.na = TRUE, max_pvalue = 0.05, top = 15)

# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))

# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)

laresbernardo/lares documentation built on Feb. 21, 2025, 9:58 a.m.