inspect_imb | R Documentation |
For a single dataframe, summarise the most common level in each categorical column. If two dataframes are supplied, compare the most common levels of categorical features appearing in both dataframes. For grouped dataframes, summarise the levels of categorical columns in the dataframe split by group.
inspect_imb(df1, df2 = NULL, include_na = FALSE)
df1 |
A dataframe. |
df2 |
An optional second data frame for comparing columnwise imbalance.
Defaults to |
include_na |
Logical flag, whether to include missing values as a unique level. Default
is |
For a single dataframe, the tibble returned contains the columns:
col_name
, a character vector containing column names of df1
.
value
, a character vector containing the most common categorical level
in each column of df1
.
pcnt
, the relative frequency of each column's most common categorical level
expressed as a percentage.
cnt
, the number of occurrences of the most common categorical level in each
column of df1
.
For a pair of dataframes, the tibble returned contains the columns:
col_name
, a character vector containing names of the unique columns in df1
and df2
.
value
, a character vector containing the most common categorical level
in each column of df1
.
pcnt_1
, pcnt_2
, the percentage occurrence of value
in
the column col_name
for each of df1
and df2
, respectively.
cnt_1
, cnt_2
, the number of occurrences of of value
in
the column col_name
for each of df1
and df2
, respectively.
p_value
, p-value associated with the null hypothesis that the true rate of
occurrence is the same for both dataframes. Small values indicate stronger evidence of a difference
in the rate of occurrence.
For a grouped dataframe, the tibble returned is as for a single dataframe, but where
the first k
columns are the grouping columns. There will be as many rows in the result
as there are unique combinations of the grouping variables.
A tibble summarising and comparing the imbalance for each categorical column in one or a pair of dataframes.
Alastair Rushworth
inspect_cat
, show_plot
# Load dplyr for starwars data & pipe library(dplyr) # Single dataframe summary inspect_imb(starwars) # Paired dataframe comparison inspect_imb(starwars, starwars[1:20, ]) # Grouped dataframe summary starwars %>% group_by(gender) %>% inspect_imb()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.