Illustrative data: starwars

The examples below make use of the starwars from the dplyr package.

library(dplyr)
data(starwars, package = "dplyr")

# print the first few rows
head(starwars)

inspect_cat() for a single data frame

inspect_cat() returns a tibble summarising categorical features in a data frame, combining the functionality of the inspect_imb() and table() functions. The tibble generated contains the columns

library(inspectdf)

# explore the categorical features 
x <- inspect_cat(starwars)
x

For example, the levels for the hair_color column are

# show frequency tibble for `hair_color` column:
x$levels$hair_color

Note that by default, if missing (NA) values are present, they are counted as a distinct categorical level. A barplot showing the composition of each categorical column can be created using the show_plot() function. Note how missing values are shown as grey bars:

x %>% show_plot()

The argument high_cardinality in the show_plot() function can be used to bundle together categories that occur only a small number of times. For example, to combine categories only occurring once, use:

x %>% 
  show_plot(high_cardinality = 1)

The resulting bundles are shown in purple.

inspect_cat() for two data frames

To illustrate the comparison of two data frames, we first create two new data frames by randomly sampling the rows of starwars and also dropping some of the columns. The results are assigned to the objects star_1 and star_2:

# sample 50 rows from `starwars`
star_1 <- starwars %>% sample_n(50)
# sample 50 rows from `starwars` and drop the first two columns
star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)

To compare the character columns in a pair of data frames, use the inspect_cat():

inspect_cat(star_1, star_2)

The tibble returned has the following columns



alastairrushworth/inspectdf documentation built on Aug. 15, 2022, 1:23 a.m.