knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
An R-Package that keeps track of all performed sanity checks.
During the preparation of data set(s) one usually performs some sanity checks. The idea is that irrespective of where the checks are performed, they are centralized by this package in order to list all at once with examples if a check failed.
Assume you process a data set and you have different functions for certain aspects. Within those functions you can make some checks, document what you did, what the outcome of the check was and store some examples where the check failed. At the end you can summarize the performed checks (that might be scattered all over our source code) and their outcomes.
raw_data <- data.frame( id = 1:4, start = c("2020-04-12", "2010-01-20", "2020-02-20", "2020-01-23"), end = c("2020-03-13", "2020-01-26", "2020-03-01", "2020-01-26"), height_m = c(1.77, 144, 1.89, 1.74), stringsAsFactors = FALSE)
For illustration we consider a very simple data set:
raw_data
We have two simple data-preparation functions for our raw-data-set:
correct_height <- function(raw_data) { ret <- raw_data # functions starting with sc_ are convenience functions the package # offers for ease of use sc <- sanityTracker::sc_cols_bounded_above( object = ret, cols = "height_m", upper_bound = 100, description = "Persons are smaller than 100m", counter_meas = "Divide by 100. Assume height is given in cm", ) if (sc[["height_m"]][["fail"]]) { fail_vec <- sc[["height_m"]][["fail_vec"]] ret$height_m[fail_vec] <- ret$height_m[fail_vec] / 100 } sanityTracker::sc_cols_bounded( object = ret, cols = "height_m", rule = "[0.8, 2.5]", description = "Persons are between 0.8m and 2.5m" ) return(ret) } prep <- function(raw_data) { sanityTracker::sc_cols_unique( object = raw_data, cols = "id", description = "No duplicated ids" ) raw_data$start <- as.Date(raw_data$start) raw_data$end <- as.Date(raw_data$end) # sanity checks can be recoreded as long a # logical vector exists with add_sanity_check() sanityTracker::add_sanity_check( fail_vec = raw_data$end < raw_data$start, description = "start-date <= end-date", data = raw_data ) ret <- correct_height(raw_data = raw_data) return(ret) }
After applying the prep-function we can summarize the sanity checks
wrangled_data <- prep(raw_data = raw_data) sanity_checks <- sanityTracker::get_sanity_checks() sanity_checks
This directly gives an overview of what was performed, which check failed how often, what counter measure was applied and in case of a fail also random rows (by default at most 3) of the data set where the check failed.
sanity_checks[2, ]
sanity_checks[2, ]$example
You can install it from CRAN
install.packages("sanityTracker")
or github
remotes::install_github("MarselScheer/sanityTracker")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.