knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Expectdata is an R package that makes it easy to check assumptions about a data frame before conducting analyses. Below is a concise tour of some of the things expectdata can do for you.

Check for unexpected duplication

library(expectdata)
expect_no_duplicates(mtcars, "cyl")

The default return_df == TRUE option allows for using these function as part of a dplyr piped expression that is stopped when data assumptions are not kept.

library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
mtcars %>% 
  filter(cyl == 4) %>% 
  expect_no_duplicates("wt", return_df = TRUE) %>% 
  ggplot(aes(x = wt, y = hp, color = mpg, size = mpg)) +
  geom_point()

If there are no expectations violated, an "OK" message is printed.

After joining two data sets you may want to verify that no unintended duplication occurred. Expectdata allows comparing pre- and post- processing to ensure they have the same number of rows before continuing.

expect_same_number_of_rows(mtcars, mtcars, return_df = FALSE)
expect_same_number_of_rows(mtcars, iris, show_fails = FALSE, stop_if_fail = FALSE, return_df = FALSE)

# can also compare to no df2 to check is zero rows
expect_same_number_of_rows(mtcars, show_fails = FALSE, stop_if_fail = FALSE, return_df = FALSE) 

Can see how the stop_if_fail = FALSE option will turn failed expectations into warnings instead of errors.

Check for existance of problematic rows

Comparing a data frame to an empty, zero-length data frame can also be done more explicitly. If the expectations fail, cases can be shown to begin the next step of exploring why these showed up.

expect_zero_rows(mtcars[mtcars$cyl == 0, ], return_df = TRUE)
expect_zero_rows(mtcars$cyl[mtcars$cyl == 0])
expect_zero_rows(mtcars, show_fails = TRUE)

This works well at the end of a pipeline that starts with a data frame, runs some logic to filter to cases that should not exist, then runs expect_zero_rows() to check no cases exist.

# verify no cars have zero cylindars
mtcars %>% 
  filter(cyl == 0) %>% 
  expect_zero_rows(return_df = FALSE)

Can also check for NAs in a vector, specific columns of a data frame, or a whole data frame.

expect_no_nas(mtcars, "cyl", return_df = FALSE)
expect_no_nas(mtcars, return_df = FALSE)
expect_no_nas(c(0, 3, 4, 5))
expect_no_nas(c(0, 3, NA, 5))


dgarmat/expectdata documentation built on Oct. 22, 2019, 5:19 a.m.