starwars
The examples below make use of the starwars
from the dplyr
package.
library(dplyr) data(starwars, package = "dplyr") # print the first few rows head(starwars)
inspect_types()
for a single dataframeTo explore the column types in a data frame, use the function inspect_types()
. The command returns a tibble
summarising the counts and percentages of columns with particular types.
library(inspectdf) # return tibble showing columns types x <- inspect_types(starwars) x
The names of columns with specific type can be accessed in the list columns col_name
, for example the list columns are found using
x$col_name$list
A radial visualisation of all columns and types is returned by the show_plot()
command:
# radial visualisation of column types x %>% show_plot()
inspect_types()
for comparing two data framesTo illustrate the comparison of two data frames, we create two new data frames by randomly sampling the rows of starwars
, dropping some of the columns and recoding the colums mass
to character. The results are assigned to the objects star_1
and star_2
:
# sample 50 rows from `starwars` star_1 <- starwars %>% sample_n(50) # recode the mass column to character star_1$mass <- as.character(star_1$mass) # sample 50 rows from `starwars` and drop the first two columns star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)
When a second dataframe is provided, inspect_types()
returns a tibble that compares the column names and types found in each of the input data frames. The columns cnt_1
and cnt_2
contain the total number of columns with each type found in the first and second inputs, respectively:
# compare the column types for star_1 and star_2 x <- inspect_types(star_1, star_2) x
columns
is a named list column containing a list of tibbles. Each tibble records the names of columns with each type. As an example, all numeric column names is accessed using:
# tibble of numeric columns in star_1 or star_2 x$columns$numeric
The issues
column contains a list of character vectors describing specific points of type or columns mismatch between the two data frame inputs. The simplest way to view all of the issues is to use the unnest()
function from the tidyr
package:
library(tidyr) # unnest the issue columns so we can see where the differences are between star_1 and star_2 x %>% select(type, issues) %>% unnest(issues)
Finally, we can produce a simple visualisation showing the differences between star_1
and star_2
using show_plot()
:
# print visualisation of column type comparison inspect_types(star_1, star_2) %>% show_plot()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.