CRAN status [CRAN Downloads R-CMD-check Lifecycle: stable R-CMD-check

knitr::opts_chunk$set(
  collapse = FALSE,
  comment = "#>",
  fig.path = "README-"
)
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(dplyr)
library(hablar)
mtcars <- as_tibble(mtcars)

hablar

The mission of hablar is for you to get non-astonishing results! That means that functions return what you expected. R has some intuitive quirks that beginners and experienced programmers fail to identify. Some of the first weird features of R that hablar solves:

hablar follows the syntax API of tidyverse and works seamlessly with dplyr and tidyselect.

Installation

You can install hablar from CRAN:

install.packages("hablar")

Or preferably:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, hablar)

convert

The most useful function of hablar is maybe convert. convert helps the user to quickly and dynamically change data type of columns in a data frame. convert always converts factors to character before further conversion. Works with tidyselect.

mtcars %>% 
  convert(int(cyl, am),
          fct(disp:drat),
          chr(contains("w")))

For more information type vignette("convert") in the console.

Non-Astonishing summary functions

Often summary function like min, max and mean return surprising results. Combining _ with your summary function ensures you that you will get a result, if there is one in your data. It ignores irrational numbers like Inf and NaN as well as NA. If all elements are NA, Inf, NaN it returns NA.

starwars %>% 
  summarise(min_height_baseR = min(height),
            min_height_hablar = min_(height))

The function min_ omitted that the variable height contained NA. For more information type vignette("s") in the console.

Find the problem

When cleaning data you spend a lot of time understanding your data. Sometimes you get more row than you expected when doing a left_join(). Or you did not know that certain column contained missing values NA or irrational values like Inf or NaN.

In hablar the find_* functions speeds up your search for the problem. To find duplicated rows you simply df %>% find_duplicates(). You can also find duplicates in in specific columns, which can be useful before joins.

# Create df with duplicates
df <- mtcars %>% 
  bind_rows(mtcars %>% slice(1, 5, 9))

# Return rows with duplicates in cyl and am
df %>% 
  find_duplicates(cyl, am)

There are also find functions for other cases. For example find_na() returns rows with missing values.

starwars %>% 
  find_na(height)

If you rather want a Boolean value instead then e.g. check_duplicates() returns TRUE if the data frame contains duplicates, otherwise it returns FALSE.

...apply the solution

Let's say that we have found a problem is caused by missing values in the column height and you want to replace all missing values with the integer 100. hablar comes with an additional ways of doing if-or-else.

starwars %>% 
  find_na(height) %>% 
  mutate(height = if_na(height, 100L))

In the chunk above we successfully replaced all missing heights with the integer 100. hablar also contain the self explained:

which works in the same way as the examples above.

retype

A function for quick and dirty data type conversion. All columns are evaluated and converted to the simplest possible without loosing any information.

mtcars %>% retype()

All variables with only integer were converted to type integer. For more information type vignette("retype") in the console.

Note

Hablar means 'speak R' in Spanish.



davidsjoberg/hablar documentation built on March 13, 2023, 1:26 a.m.