starwarsThe examples below make use of the starwars and storms data from the dplyr package
# some example data data(starwars, package = "dplyr") data(storms, package = "dplyr")
For illustrating comparisons of dataframes, use the starwars data and produce two new dataframes star_1 and star_2 that randomly sample the rows of the original and drop a couple of columns.
library(dplyr) star_1 <- starwars %>% sample_n(50) star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)
inspect_na() for a single dataframeinspect_na() summarises the prevalence of missing values by each column in a data frame. A tibble containing the count (cnt) and the overall percentage (pcnt) of missing values is returned.
library(inspectdf) inspect_na(starwars)
A barplot can be produced by passing the result to show_plot():
inspect_na(starwars) %>% show_plot()
inspect_na() for two dataframesWhen a second dataframe is provided, inspect_na() returns a tibble containing counts and percentage missingness by column, with summaries for the first and second data frames are show in columns with names appended with _1 and _2, respectively. In addition, a $p$-value is calculated which provides a measure of evidence of whether the difference in missing values is significantly different.
inspect_na(star_1, star_2)
inspect_na(star_1, star_2) %>% show_plot()
Notes:
height appears in star_1 but nor star_2, then the corresponding pcnt_, cnt_ and p_value columns will contain NAp_value is NA.p_value cannot be calculated, no coloured bar is shown.alpha argument to inspect_na(). The default is alpha = 0.05.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.