knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Too long; didn't read?
Use this example to write functions for data with clean column names (like_this
) or any unclean version (likeThis
and like.this
).
suppressPackageStartupMessages(library(dplyr)) library(rlang) library(r2dii.utils) f <- function(data, ...) { # 1. First, clean data and groups (if any). clean <- clean_column_names(data) dots <- clean_quo(enquos(...)) # 2. Ensure `data` has all the clean names that your data uses check_crucial_names(clean, c("x_x", quo_chr(dots))) # 2. Then, write your code. result <- clean %>% group_by(!!! dots) %>% # If you have to hard-wire column names, always use snake_style. select(x_x) # 3. Last, restore the unclean column names of the original data. result %>% unclean_column_names(data) } # EXAMPLES # snake_stye f(tibble(x_x = 1, y_y = 1), x_x) # dot.style f(tibble(x.x = 1, y.y = 1), x.x) # cammelStyle f(tibble(xX = 1, yY = 1), xX) # Fails: Clean version of the given name isn't the hard-wired clean name bad <- tibble(x.x.x = 1) names(clean_column_names(bad)) try(f(bad, x.x.x))
If your function takes a single group via a named argument (not ...
), capture it with enquo()
(not enquos()
) and unquote it with !!
or {{
(not !!!
).
library(dplyr) library(purrr) library(rlang) library(r2dii.utils)
Data with unclean column names is difficult to work with. Sooner or later you'll forget if a column you need is typed LikeThis
, likeThis
, Like.This
or like.This
. Luckily, all of those forms can be cleaned in a unique way with janitor::make_clean_names()
:
unclean <- c( "LikeThis", "likeThis", "Like.This", "like.This" ) unclean %>% map_chr(janitor::make_clean_names)
And now you can work with a clean column name like_this
.
Your code will be more maintainable if you always write code for clean names. But janitor::clean_names()
seems to lack support for groups, or a way to restore unclean column names and groups so that you clean up after yourself and leave the data as you found it. r2dii.utils has a family of functions dedicated to solving that problem:
clean_column_names()
is similar to janitor::clean_names()
.unclean <- tibble(x.x = c(1, 1, 2, 2), y.y = 1:4) clean_column_names(unclean) janitor::clean_names(unclean)
But clean_column_names()
also cleans dplyr groups; janitor::clean_names()
doesn't.
unclean %>% group_by(x.x) %>% janitor::clean_names() clean <- unclean %>% group_by(x.x) %>% clean_column_names() clean
Now you can work comfortably with clean column names such as x_x
and y_y
instead of unclean column names such as x.x
or y.y
.
result <- clean %>% mutate(total = sum(y_y)) result
unclean_column_names()
helps you restore the original, unclean column names -- so you leave the data as close as possible to how you found it.
unclean_column_names(result, unclean)
This becomes important when you write functions. If you can't force your users to input clean data, you can clean it yourself, do what you mean to do, then restore the unclean names and return.
sum_y <- function(data) { clean <- clean_column_names(data) result <- clean %>% mutate(total = sum(y_y)) unclean_column_names(result, data) } sum_y(unclean)
If you functions take groups, you need to clean them too. Here is how:
janitor::make_clean_names()
:sum_y <- function(data, by) { clean <- clean_column_names(data) by <- janitor::make_clean_names(by) result <- clean %>% group_by(.data[[by]]) %>% mutate(total = sum(y_y)) unclean_column_names(result, data) } sum_y(unclean, by = "x.x")
rlang::enquo()
, then clean it with clean_quo()
and unquote it with either bang bang !!
or embrace {{
:sum_y <- function(data, by) { clean <- clean_column_names(data) by <- clean_quo(enquo(by)) result <- clean %>% # You can also use `group_by(!!by)` group_by({{ by }}) %>% mutate(total = sum(y_y)) unclean_column_names(result, data) } sum_y(unclean, by = x.x)
...
, you'll need to capture them with rlang::enquos()
, then clean them with clean_quo()
and unquote-splice it with big bang !!!
:sum_y <- function(data, ...) { clean <- clean_column_names(data) by <- clean_quo(enquos(...)) result <- clean %>% group_by(!!!by) %>% mutate(total = sum(y_y)) unclean_column_names(result, data) } sum_y(unclean, x.x, y.y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.