| inspect_num | R Documentation |
For a single dataframe, summarise the numeric columns. If two dataframes are supplied, compare numeric columns appearing in both dataframes. For grouped dataframes, summarise numeric columns separately for each group.
inspect_num(df1, df2 = NULL, breaks = 20, include_int = TRUE)
df1 |
A dataframe. |
df2 |
An optional second dataframe for comparing categorical levels.
Defaults to |
breaks |
Integer number of breaks used for histogram bins, passed to
|
include_int |
Logical flag, whether to include integer columns in numeric summaries.
Defaults to |
For a single dataframe, the tibble returned contains the columns:
col_name, a character vector containing the column names in df1
min, q1, median, mean, q3, max and
sd, the minimum, lower quartile, median, mean, upper quartile, maximum and
standard deviation for each numeric column.
pcnt_na, the percentage of each numeric feature that is missing
hist, a named list of tibbles containing the relative frequency of values
falling in bins determined by breaks.
For a pair of dataframes, the tibble returned contains the columns:
col_name, a character vector containing the column names in df1
and df2
hist_1, hist_2, a list column for histograms of each of df1 and df2.
Where a column appears in both dataframe, the bins used for df1 are reused to
calculate histograms for df2.
jsd, a numeric column containing the Jensen-Shannon divergence. This measures the difference in distribution of a pair of binned numeric features. Values near to 0 indicate agreement of the distributions, while 1 indicates disagreement.
pval, the p-value corresponding to a NHT that the true frequencies of histogram bins are equal.
A small p indicates evidence that the the two sets of relative frequencies are actually different. The test
is based on a modified Chi-squared statistic.
For a grouped dataframe, the tibble returned is as for a single dataframe, but where
the first k columns are the grouping columns. There will be as many rows in the result
as there are unique combinations of the grouping variables.
A tibble containing statistical summaries of the numeric
columns of df1, or comparing the histograms of df1 and df2.
Alastair Rushworth
show_plot
# Load dplyr for starwars data & pipe library(dplyr) # Single dataframe summary inspect_num(starwars) # Paired dataframe comparison inspect_num(starwars, starwars[1:20, ]) # Grouped dataframe summary starwars %>% group_by(gender) %>% inspect_num()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.