knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Expectdata is an R package that makes it easy to check assumptions about a data frame before conducting analyses. Below is a concise tour of some of the things expectdata can do for you.
library(expectdata) expect_no_duplicates(mtcars, "cyl")
The default return_df == TRUE
option allows for using these function as part of a dplyr piped expression that is stopped when data assumptions are not kept.
library(dplyr, warn.conflicts = FALSE) library(ggplot2) mtcars %>% filter(cyl == 4) %>% expect_no_duplicates("wt", return_df = TRUE) %>% ggplot(aes(x = wt, y = hp, color = mpg, size = mpg)) + geom_point()
If there are no expectations violated, an "OK" message is printed.
After joining two data sets you may want to verify that no unintended duplication occurred. Expectdata allows comparing pre- and post- processing to ensure they have the same number of rows before continuing.
expect_same_number_of_rows(mtcars, mtcars, return_df = FALSE) expect_same_number_of_rows(mtcars, iris, show_fails = FALSE, stop_if_fail = FALSE, return_df = FALSE) # can also compare to no df2 to check is zero rows expect_same_number_of_rows(mtcars, show_fails = FALSE, stop_if_fail = FALSE, return_df = FALSE)
Can see how the stop_if_fail = FALSE
option will turn failed expectations into warnings instead of errors.
Comparing a data frame to an empty, zero-length data frame can also be done more explicitly. If the expectations fail, cases can be shown to begin the next step of exploring why these showed up.
expect_zero_rows(mtcars[mtcars$cyl == 0, ], return_df = TRUE) expect_zero_rows(mtcars$cyl[mtcars$cyl == 0]) expect_zero_rows(mtcars, show_fails = TRUE)
This works well at the end of a pipeline that starts with a data frame, runs some logic to filter to cases that should not exist, then runs expect_zero_rows()
to check no cases exist.
# verify no cars have zero cylindars mtcars %>% filter(cyl == 0) %>% expect_zero_rows(return_df = FALSE)
Can also check for NAs in a vector, specific columns of a data frame, or a whole data frame.
expect_no_nas(mtcars, "cyl", return_df = FALSE) expect_no_nas(mtcars, return_df = FALSE) expect_no_nas(c(0, 3, 4, 5)) expect_no_nas(c(0, 3, NA, 5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.