In 2DegreesInvesting/r2dii.utils: General Utility Tools

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

tl;dr

Too long; didn't read?

Use this example to write functions for data with clean column names (like_this) or any unclean version (likeThis and like.this).

suppressPackageStartupMessages(library(dplyr))
library(rlang)
library(r2dii.utils)

f <- function(data, ...) {
  # 1. First, clean data and groups (if any).
  clean <- clean_column_names(data)
  dots <- clean_quo(enquos(...))

  # 2. Ensure `data` has all the clean names that your data uses
  check_crucial_names(clean, c("x_x", quo_chr(dots)))

  # 2. Then, write your code.
  result <- clean %>% 
    group_by(!!! dots) %>% 
    # If you have to hard-wire column names, always use snake_style.
    select(x_x)

  # 3. Last, restore the unclean column names of the original data.
  result %>% 
    unclean_column_names(data)
}

# EXAMPLES 

# snake_stye
f(tibble(x_x = 1, y_y = 1), x_x)

# dot.style
f(tibble(x.x = 1, y.y = 1), x.x)

# cammelStyle
f(tibble(xX = 1, yY = 1), xX)

# Fails: Clean version of the given name isn't the hard-wired clean name
bad <- tibble(x.x.x = 1)
names(clean_column_names(bad))
try(f(bad, x.x.x))

If your function takes a single group via a named argument (not ...), capture it with enquo() (not enquos()) and unquote it with !! or {{ (not !!!).

Details

library(dplyr)
library(purrr)
library(rlang)
library(r2dii.utils)

Data with unclean column names is difficult to work with. Sooner or later you'll forget if a column you need is typed LikeThis, likeThis, Like.This or like.This. Luckily, all of those forms can be cleaned in a unique way with janitor::make_clean_names():

unclean <- c(
  "LikeThis",
  "likeThis",
  "Like.This",
  "like.This"
)

unclean %>% map_chr(janitor::make_clean_names)

And now you can work with a clean column name like_this.

Your code will be more maintainable if you always write code for clean names. But janitor::clean_names() seems to lack support for groups, or a way to restore unclean column names and groups so that you clean up after yourself and leave the data as you found it. r2dii.utils has a family of functions dedicated to solving that problem:

clean_column_names() is similar to janitor::clean_names().

unclean <- tibble(x.x = c(1, 1, 2, 2), y.y = 1:4)

clean_column_names(unclean)

janitor::clean_names(unclean)

But clean_column_names() also cleans dplyr groups; janitor::clean_names() doesn't.

unclean %>% 
  group_by(x.x) %>% 
  janitor::clean_names()

clean <- unclean %>% 
  group_by(x.x) %>% 
  clean_column_names()

clean

Now you can work comfortably with clean column names such as x_x and y_y instead of unclean column names such as x.x or y.y.

result <- clean %>% 
  mutate(total = sum(y_y))

result

unclean_column_names() helps you restore the original, unclean column names -- so you leave the data as close as possible to how you found it.

unclean_column_names(result, unclean)

This becomes important when you write functions. If you can't force your users to input clean data, you can clean it yourself, do what you mean to do, then restore the unclean names and return.

sum_y <- function(data) {
  clean <- clean_column_names(data)

  result <- clean %>% 
    mutate(total = sum(y_y))

  unclean_column_names(result, data)
}

sum_y(unclean)

Handling groups

If you functions take groups, you need to clean them too. Here is how:

For one group passed as a string, clean it with janitor::make_clean_names():

sum_y <- function(data, by) {
  clean <- clean_column_names(data)
  by <- janitor::make_clean_names(by)

  result <- clean %>% 
    group_by(.data[[by]]) %>% 
    mutate(total = sum(y_y))

  unclean_column_names(result, data)
}

sum_y(unclean, by = "x.x")

For one group passed as a symbol, you'll need to capture it with rlang::enquo(), then clean it with clean_quo() and unquote it with either bang bang !! or embrace {{:

sum_y <- function(data, by) {
  clean <- clean_column_names(data)
  by <- clean_quo(enquo(by))

  result <- clean %>% 
    # You can also use `group_by(!!by)`
    group_by({{ by }}) %>% 
    mutate(total = sum(y_y))

  unclean_column_names(result, data)
}

sum_y(unclean, by = x.x)

And for multiple groups passed via ..., you'll need to capture them with rlang::enquos(), then clean them with clean_quo() and unquote-splice it with big bang !!!:

sum_y <- function(data, ...) {
  clean <- clean_column_names(data)
  by <- clean_quo(enquos(...))

  result <- clean %>% 
    group_by(!!!by) %>%
    mutate(total = sum(y_y))

  unclean_column_names(result, data)
}

sum_y(unclean, x.x, y.y)