Illustrative data: starwars

The examples below make use of the starwars from the dplyr package.

library(dplyr)
data(starwars, package = "dplyr")

# print the first few rows
head(starwars)

inspect_types() for a single dataframe

To explore the column types in a data frame, use the function inspect_types(). The command returns a tibble summarising the counts and percentages of columns with particular types.

library(inspectdf)

# return tibble showing columns types
x <- inspect_types(starwars)
x

The names of columns with specific type can be accessed in the list columns col_name, for example the list columns are found using

x$col_name$list

A radial visualisation of all columns and types is returned by the show_plot() command:

# radial visualisation of column types
x %>% show_plot()

inspect_types() for comparing two data frames

To illustrate the comparison of two data frames, we create two new data frames by randomly sampling the rows of starwars, dropping some of the columns and recoding the colums mass to character. The results are assigned to the objects star_1 and star_2:

# sample 50 rows from `starwars`
star_1 <- starwars %>% sample_n(50)
# recode the mass column to character
star_1$mass <- as.character(star_1$mass)
# sample 50 rows from `starwars` and drop the first two columns
star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)

When a second dataframe is provided, inspect_types() returns a tibble that compares the column names and types found in each of the input data frames. The columns cnt_1 and cnt_2 contain the total number of columns with each type found in the first and second inputs, respectively:

# compare the column types for star_1 and star_2
x <- inspect_types(star_1, star_2)
x

columns is a named list column containing a list of tibbles. Each tibble records the names of columns with each type. As an example, all numeric column names is accessed using:

# tibble of numeric columns in star_1 or star_2
x$columns$numeric

The issues column contains a list of character vectors describing specific points of type or columns mismatch between the two data frame inputs. The simplest way to view all of the issues is to use the unnest() function from the tidyr package:

library(tidyr)
# unnest the issue columns so we can see where the differences are between star_1 and star_2
x %>% select(type, issues) %>% unnest(issues)

Finally, we can produce a simple visualisation showing the differences between star_1 and star_2 using show_plot():

# print visualisation of column type comparison
inspect_types(star_1, star_2) %>% show_plot()


alastairrushworth/inspectdf documentation built on Aug. 15, 2022, 1:23 a.m.