corr_var | R Documentation |
This function correlates a whole dataframe with a single feature. It
automatically runs ohse
(one-hot-smart-encoding) so no need to input
only numerical values.
corr_var(
df,
var,
ignore = NULL,
trim = 0,
clean = FALSE,
plot = TRUE,
top = NA,
ceiling = 1,
max_pvalue = 1,
limit = 10,
ranks = FALSE,
zeroes = FALSE,
save = FALSE,
quiet = FALSE,
...
)
## S3 method for class 'corr_var'
plot(x, var, max_pvalue = 1, top = NA, limit = NULL, ...)
df |
Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered. |
var |
Variable. Name of the variable to correlate. Note that if the
variable |
ignore |
Character vector. Which columns do you wish to exclude? |
trim |
Integer. Trim words until the nth character for categorical values (applies for both, target and values) |
clean |
Boolean. Use lares::cleanText for categorical values (applies for both, target and values) |
plot |
Boolean. Do you wish to plot the result? If set to TRUE, the function will return only the plot and not the result's data |
top |
Integer. If you want to plot the top correlations, define how many |
ceiling |
Numeric. Remove all correlations above... Range: (0-1] |
max_pvalue |
Numeric. Filter non-significant variables. Range (0, 1] |
limit |
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to |
ranks |
Boolean. Add ranking numbers? |
zeroes |
Do you wish to keep zeroes in correlations too? |
save |
Boolean. Save output plot into working directory |
quiet |
Boolean. Keep quiet? If not, show messages |
... |
Additional parameters passed to |
x |
corr_var object |
data.frame. With variables, correlation and p-value results for each feature, arranged by descending absolute correlation value.
Other Exploratory:
corr_cross()
,
crosstab()
,
df_str()
,
distr()
,
freqs()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
Other Correlations:
corr()
,
corr_cross()
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset
corr_var(dft, Survived, method = "spearman", plot = FALSE, top = 10)
# With plots, results are easier to compare:
# Correlate Survived with everything else and show only significant results
dft %>% corr_var(Survived_TRUE, max_pvalue = 0.05)
# Top 15 with less than 50% correlation and show ranks
dft %>% corr_var(Survived_TRUE, ceiling = .6, top = 15, ranks = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.