distr | R Documentation |
Compare the distribution of a target variable vs another variable. This function automatically splits into quantiles for numerical variables. Custom and tidyverse friendly.
distr(
data,
...,
type = 1,
ref = TRUE,
note = NA,
top = 10,
breaks = 10,
na.rm = FALSE,
force = "none",
trim = 0,
clean = FALSE,
abc = FALSE,
custom_colours = FALSE,
plot = TRUE,
chords = FALSE,
save = FALSE,
subdir = NA
)
data |
Dataframe |
... |
Variables. Main (target variable) and secondary (values variable) to group by (if needed). |
type |
Integer. 1 for both plots, 2 for counter plot only, 3 for percentages plot only. |
ref |
Boolean. Show a reference line if levels = 2? Quite useful when data is unbalanced (not 50/50) because a reference line is drawn. |
note |
Character. Caption for the plot. |
top |
Integer. Filter and plot the most n frequent for categorical values. |
breaks |
Integer. Number of splits for numerical values. |
na.rm |
Boolean. Ignore |
force |
Character. Force class on the values data. Choose between 'none', 'character', 'numeric', 'date' |
trim |
Integer. Trim labels until the nth character for categorical values (applies for both, target and values) |
clean |
Boolean. Use |
abc |
Boolean. Do you wish to sort by alphabetical order? |
custom_colours |
Boolean. Use custom colours function? |
plot |
Boolean. Return a plot? Otherwise, a table with results |
chords |
Boolean. Use a chords plot? |
save |
Boolean. Save the output plot in our working directory |
subdir |
Character. Into which subdirectory do you wish to save the plot to? |
Plot when plot=TRUE
with two plots in one: counter distribution
grouped by cuts, and proportions distribution grouped by same cuts. data.frame when
plot=FALSE
with counting, percentages, and cumulative percentages results.
When type
argument is used, single plots will be returned.
Other Exploratory:
corr_cross()
,
corr_var()
,
crosstab()
,
df_str()
,
freqs()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
Other Visualization:
freqs()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
noPlot()
,
plot_chord()
,
plot_survey()
,
plot_timeline()
,
tree_var()
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset
# Relation for categorical/categorical values
distr(dft, Survived, Sex)
# Relation for categorical/numeric values
dft %>%
distr(Survived, Fare, plot = FALSE) %>%
head(10)
# Sort values
dft %>% distr(Survived, Fare, abc = TRUE)
# Less splits/breaks
dft %>% distr(Survived, Fare, abc = TRUE, breaks = 5)
# Distribution of numerical only
dft[dft$Fare < 20, ] %>% distr(Fare)
# Distribution of numerical/numerical
dft %>% distr(Fare, Age)
# Select only one of the two default plots of distr()
dft %>% distr(Survived, Age, type = 2)
dft %>% distr(Survived, Age, type = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.