corr_cross | R Documentation |
This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.
corr_cross(
df,
plot = TRUE,
pvalue = TRUE,
max_pvalue = 1,
type = 1,
max = 1,
top = 20,
local = 1,
ignore = NULL,
contains = NA,
grid = TRUE,
rm.na = FALSE,
quiet = FALSE,
...
)
df |
Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered. |
plot |
Boolean. Show and return a plot? |
pvalue |
Boolean. Returns a list, with correlations and statistical significance (p-value) for each value. |
max_pvalue |
Numeric. Filter non-significant variables. Range (0, 1] |
type |
Integer. Plot type. 1 is for overall rank. 2 is for local rank. |
max |
Numeric. Maximum correlation permitted (from 0 to 1) |
top |
Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations |
local |
Integer. Label top n local correlations. Only valid when type = 2 |
ignore |
Vector or character. Which column should be ignored? |
contains |
Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used). |
grid |
Boolean. Separate into grids? |
rm.na |
Boolean. Remove NAs? |
quiet |
Boolean. Keep quiet? If not, show messages |
... |
Additional parameters passed to |
DataScience+ Post: Find Insights with Ranked Cross-Correlations
Depending on input plot
, we get correlation and p-value results for
every combination of features, arranged by descending absolute correlation value,
with a data.frame plot = FALSE
or plot plot = TRUE
.
Other Correlations:
corr()
,
corr_var()
Other Exploratory:
corr_var()
,
crosstab()
,
df_str()
,
distr()
,
freqs()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset
# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)
# Show only most relevant results filtered by pvalue
corr_cross(dft, rm.na = TRUE, max_pvalue = 0.05, top = 15)
# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))
# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.