inspect_num | R Documentation |
For a single dataframe, summarise the numeric columns. If two dataframes are supplied, compare numeric columns appearing in both dataframes. For grouped dataframes, summarise numeric columns separately for each group.
inspect_num(df1, df2 = NULL, breaks = 20, include_int = TRUE)
df1 |
A dataframe. |
df2 |
An optional second dataframe for comparing categorical levels.
Defaults to |
breaks |
Integer number of breaks used for histogram bins, passed to
|
include_int |
Logical flag, whether to include integer columns in numeric summaries.
Defaults to |
For a single dataframe, the tibble returned contains the columns:
col_name
, a character vector containing the column names in df1
min
, q1
, median
, mean
, q3
, max
and
sd
, the minimum, lower quartile, median, mean, upper quartile, maximum and
standard deviation for each numeric column.
pcnt_na
, the percentage of each numeric feature that is missing
hist
, a named list of tibbles containing the relative frequency of values
falling in bins determined by breaks
.
For a pair of dataframes, the tibble returned contains the columns:
col_name
, a character vector containing the column names in df1
and df2
hist_1
, hist_2
, a list column for histograms of each of df1
and df2
.
Where a column appears in both dataframe, the bins used for df1
are reused to
calculate histograms for df2
.
jsd, a numeric column containing the Jensen-Shannon divergence. This measures the difference in distribution of a pair of binned numeric features. Values near to 0 indicate agreement of the distributions, while 1 indicates disagreement.
pval
, the p-value corresponding to a NHT that the true frequencies of histogram bins are equal.
A small p indicates evidence that the the two sets of relative frequencies are actually different. The test
is based on a modified Chi-squared statistic.
For a grouped dataframe, the tibble returned is as for a single dataframe, but where
the first k
columns are the grouping columns. There will be as many rows in the result
as there are unique combinations of the grouping variables.
A tibble
containing statistical summaries of the numeric
columns of df1
, or comparing the histograms of df1
and df2
.
Alastair Rushworth
show_plot
# Load dplyr for starwars data & pipe library(dplyr) # Single dataframe summary inspect_num(starwars) # Paired dataframe comparison inspect_num(starwars, starwars[1:20, ]) # Grouped dataframe summary starwars %>% group_by(gender) %>% inspect_num()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.