Illustrative data: starwars

The examples below make use of the starwars and storms data from the dplyr package

# some example data
data(starwars, package = "dplyr")
data(storms, package = "dplyr")

For illustrating comparisons of dataframes, use the starwars data and produce two new dataframes star_1 and star_2 that randomly sample the rows of the original and drop a couple of columns.

library(dplyr)
star_1 <- starwars %>% sample_n(50)
star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)

inspect_na() for a single dataframe

inspect_na() summarises the prevalence of missing values by each column in a data frame. A tibble containing the count (cnt) and the overall percentage (pcnt) of missing values is returned.

library(inspectdf)
inspect_na(starwars)

A barplot can be produced by passing the result to show_plot():

inspect_na(starwars) %>% show_plot()

inspect_na() for two dataframes

When a second dataframe is provided, inspect_na() returns a tibble containing counts and percentage missingness by column, with summaries for the first and second data frames are show in columns with names appended with _1 and _2, respectively. In addition, a $p$-value is calculated which provides a measure of evidence of whether the difference in missing values is significantly different.

inspect_na(star_1, star_2)
inspect_na(star_1, star_2) %>% show_plot()

Notes:



alastairrushworth/inspectdf documentation built on Aug. 15, 2022, 1:23 a.m.