View source: R/quality_assurance.R
compare_dataset_versions | R Documentation |
This function compares two versions of a dataset returning the dataset with the added, removed or changed rows identified, using the daff package. The compared dataset can then be exported into an Excel spreadsheet to quickly identify where values have been changed using conditional formatting, on text containing #.
compare_dataset_versions(old_version, new_version)
old_version |
The earlier version of the dataset as a data frame. |
new_version |
The later version of the dataset as a data frame. |
An initial check should be performed prior to comparing versions to check that the column names are identical and that there has not been any addition or removal of columns between dataset versions, so that the dataset schema can be made the same between versions if necessary. This check can be done using the compare function in the waldo package.
The data frame with an additional difference column indicating new, removed or updated rows highlighted with #.
suppressPackageStartupMessages({
suppressWarnings({
library(palmerpenguins)
library(dplyr)
})
})
# select top 5 heaviest penguins from each species on each island
heaviest_penguins <- penguins %>%
select(species, island, body_mass_g) %>%
group_by(species, island) %>%
arrange(desc(body_mass_g)) %>%
slice_head(n = 5) %>%
ungroup()
heaviest_penguins
suppressPackageStartupMessages({
suppressWarnings({
library(dplyr)
})
})
## each version will require an unique identifier
heaviest_penguins <- heaviest_penguins %>%
mutate(id = row_number()) %>%
relocate(id)
## old_version: exclude Chinstrap penguins
heaviest_penguins_old <- heaviest_penguins %>%
filter(species != "Chinstrap")
## new_version: exclude Gentoo penguins and convert body mass to kilograms
heaviest_penguins_new <- heaviest_penguins %>%
filter(species != "Gentoo") %>%
mutate(body_mass_g = body_mass_g / 1000) %>%
rename(body_mass_kg = body_mass_g)
# check columns and column names are identical between versions
waldo::compare(heaviest_penguins_old, heaviest_penguins_new)
# make columns same between versions
heaviest_penguins_old <- heaviest_penguins_old %>%
rename(body_mass = body_mass_g)
heaviest_penguins_new <- heaviest_penguins_new %>%
rename(body_mass = body_mass_kg)
# compare versions of dataset
suppressWarnings(compare_dataset_versions(heaviest_penguins_old, heaviest_penguins_new))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.