corr_cross: Ranked cross-correlation across all variables

corr_crossR Documentation

Ranked cross-correlation across all variables

Description

This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.

Usage

corr_cross(
  df,
  plot = TRUE,
  pvalue = TRUE,
  max_pvalue = 1,
  type = 1,
  max = 1,
  top = 20,
  local = 1,
  ignore = NULL,
  contains = NA,
  grid = TRUE,
  rm.na = FALSE,
  quiet = FALSE,
  ...
)

Arguments

df

Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered.

plot

Boolean. Show and return a plot?

pvalue

Boolean. Returns a list, with correlations and statistical significance (p-value) for each value.

max_pvalue

Numeric. Filter non-significant variables. Range (0, 1]

type

Integer. Plot type. 1 is for overall rank. 2 is for local rank.

max

Numeric. Maximum correlation permitted (from 0 to 1)

top

Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations

local

Integer. Label top n local correlations. Only valid when type = 2

ignore

Vector or character. Which column should be ignored?

contains

Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used).

grid

Boolean. Separate into grids?

rm.na

Boolean. Remove NAs?

quiet

Boolean. Keep quiet? If not, show messages

...

Additional parameters passed to corr

Details

DataScience+ Post: Find Insights with Ranked Cross-Correlations

Value

Depending on input plot, we get correlation and p-value results for every combination of features, arranged by descending absolute correlation value, with a data.frame plot = FALSE or plot plot = TRUE.

See Also

Other Correlations: corr(), corr_var()

Other Exploratory: corr_var(), crosstab(), df_str(), distr(), freqs(), freqs_df(), freqs_list(), freqs_plot(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums(), tree_var()

Examples

Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)

# Show only most relevant results filtered by pvalue
corr_cross(dft, rm.na = TRUE, max_pvalue = 0.05, top = 15)

# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))

# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)

laresbernardo/lares documentation built on Jan. 14, 2025, 2:22 a.m.