knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Using msafer to locate errors when applying map() to a vector.

library(msafer)

Locating Errors with map_safe_merge() & map_safe()

For demonstration purposes, let's create a sample list of dataframes from the starwars and mtcars datasets.

sample_a <- dplyr::sample_n(dplyr::starwars, 34)
sample_a <- subset(sample_a, select = -c(height,hair_color))
sample_b <- dplyr::sample_n(dplyr::starwars, 35)
sample_c <- dplyr::sample_n(mtcars, 20)
sample_c <- subset(sample_c, select = -c(hp))
sample_list <- list(dplyr::starwars, sample_a, sample_b, sample_c, mtcars)

Let's attempt to use map() to use a function on all the dataframes in this list.

purrr::map(sample_list, dplyr::select, height)

Uh oh! Something didn't work - but what exactly? And Where exactly?

map_safe_merge() to the rescue! Pass map_safe_merge() the same arguments as map(): a vector, a function, and parameters that the function needs (if any). map_safe_merge() will return a tibble with the file numbers and any errors that may have occurred while trying to apply map().

map_safe_merge(sample_list, dplyr::select, height)

map_safe() is a even better option. Since map_safe_merge() only outputs a tibble in the order it was generated, it's hard to quickly identify the error in a huge vector. map_safe() nests the error message, and returns a tibble that contains only the unique error message and the index locating where the error occurs within the vector.

df <- map_safe(sample_list, dplyr::select, height)
df

The column which_id in the tibble generated by map_safe() is a list of tibbles that contains the indices of the elements related to the result. To show it or compute with it, use the following method:

Which_id is always the 3rd column in the tibble generated by map_safe().

df[[3]][[1]]

The result shows that in a list which contains 5 datasets, the first and the third dataset contains column height, whereas the second, 4th and fifth dataset does not, causing the error "Error in .f(.x[[i]], ...): object 'height' not found".

You can also pass in just one dataframe. map_safe_merge() will return whether or not the specified function can be applied to each column in the dataframe.

map_safe_merge(iris, log)

And map_safe() will combine the errors together.

map_safe(iris, log)

If you pass in one vector, map_safe will return whether or not the specified function can be applied to each row.

map_safe(iris, log)
map_safe(iris$Sepal.Length, log)

map_safe() and map_safe_merge() can be used based on the user's preference and how they want to use the output.

Another function within the msafer package is the check_match() function, which identifies whether the user’s requirement existed within the dataset. If it exists, then the function will return true, if it does not, it returns false.

# when working with one dataframe
check_match(dplyr::starwars, hair_color == "brown")
check_match(dplyr::starwars, height == 0)

We can see that starwars does contain a character with brown hair color, but there's no character with a height of 0.

map_safe() x check_match()

You can also use map_safe() in conjunction with check_match().

map_safe(sample_list, check_match, height==0)

Conclusion

The flagship function of msafer, map_safe(), can identify on which files errors occur when applying map() to a vector.



kpien/msafer documentation built on Dec. 25, 2019, 5:12 a.m.