Examples taken from http://dplyr.tidyverse.org/articles/programming.html
require(dplyr)
set.seed(123) df1 <- tibble::tibble(x = 1:3, y = 3:1) df2 <- tibble::tibble(x = runif(3), y = rnorm(3))
Working with different datasets
a <- 12 mutate(df1, y = a + x) mutate(df2, y = a + x)
Avoiding duplication using a function
mutate_y <- function(df){ mutate(df, y = a + x) } mutate_y(df1)
Using .data
inside the function
mutate_y <- function(df){ mutate(df, y = .data$a + .data$x) } # mutate_y(df1) # produces an error
Writing functions is difficult, if arguments should take up variable names. Taking indices values is not recommended as this might lead to confusions and might produce bugs which are very difficult to track and to correct.
(df <- tibble( g1 = c(1, 1, 2, 2, 2), g2 = c(1, 2, 1, 2, 1), a = sample(5), b = sample(5) ))
Computing summary statistics according to both grouping variables.
df %>% group_by(g1) %>% summarise(a = mean(a))
df %>% group_by(g2) %>% summarise(a = mean(a))
This leads to a lot of code duplication. The only thing that changes is the name of the grouping variable. This should be re-factored into a function where the name of the grouping variable should be given as a function argument.
my_summarise <- function(df, group_var) { df %>% group_by(!!group_var) %>% summarise(a = mean(a)) } my_summarise(df, quo(g1))
my_summarise(df, quo(g2))
Run the conversion to quosure in the function, to make the calls look nicer
my_summarise <- function(df, group_by) { group_by <- enquo(group_by) print(group_by) df %>% group_by(!!group_by) %>% summarise(a = mean(a)) } my_summarise(df, g1)
my_summarise(df, g2)
Computing three summaries with varying input variables
summarise(df, mean = mean(a), sum = sum(a), n = n())
Same thing but with different input
summarise(df, mean = mean(a*b), sum = sum(a*b), n = n())
The following statements show like a first idea on how the above two chunks might be re-factored into a funcion.
my_var <- quo(a) summarise(df, mean = mean(!!my_var), sum = sum(!!my_var), n = n())
The quosure can also be done arount the summarise()
call
quo(summarise(df, mean = mean(!!my_var), sum = sum(!!my_var), n = n()))
Putting this result into a function and remembering to replace quo()
by enquo()
my_summarise2 <- function(df, expr){ expr <- enquo(expr) summarise(df, mean = mean(!!expr), sum = sum(!!expr), n = n()) } my_summarise2(df, a)
my_summarise2(df, a*b)
Changing the input and the output variables according to the following
mutate(df, mean_a = mean(a), sum_a = sum(a))
mutate(df, mean_b = mean(b), sum_b = sum(b))
Turning this into a function, we need to paste together strings as names and using quo_name()
to convert the input expressions into strings. Assignment is done by the :=
helper from rlang
my_mutate <- function(df, expr){ expr <- enquo(expr) cat("expr\n");print(expr) cat("quo_name(expr)\n");print(quo_name(expr)) mean_name <- paste0("mean_", quo_name(expr)) sum_name <- paste0("sum_", quo_name(expr)) cat("mean_name\n");print(mean_name) cat("sum_name\n");print(sum_name) mutate(df, !!mean_name := mean(!!expr), !!sum_name := sum(!!expr)) } my_mutate(df, a)
my_mutate(df, b)
Trying to get some help for debugging
my_var <- quo(a) mutate(df, mean_a = mean(!!my_var))
For checking what happens from the perspective of dplyr, we can do the following
quo(mutate(df, mean_a = mean(!!my_var)))
Finally, if my_summarise()
should accept a varying number of grouping variables, we may have to make the following changes
quos()
to capture the three dots as a list of formulas!!!
instead of !!
to splice the arguments into group_by()
my_summarise3 <- function(df, ...){ group_var <- quos(...) df %>% group_by(!!!group_var) %>% summarise(a = mean(a)) } my_summarise3(df, g1, g2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.