dplyr_data_masking: Argument type: data-masking

Description Key terms General usage Indirection Dot-dot-dot (...)


This page describes the <data-masking> argument modifier which indicates that the argument uses tidy evaluation with data masking. If you've never heard of tidy evaluation before, start with vignette("programming").

Key terms

The primary motivation for tidy evaluation in dplyr is that it provides data masking, which blurs the distinction between two types of variables:

General usage

Data masking allows you to refer to variables in the "current" data frame (usually supplied in the .data argument), without any other prefix. It's what allows you to type (e.g.) filter(diamonds, x == 0 & y == 0 & z == 0) instead of diamonds[diamonds$x == 0 & diamonds$y == 0 & diamonds$z == 0, ].


The main challenge of data masking arises when you introduce some indirection, i.e. instead of directly typing the name of a variable you want to supply it in a function argument or character vector.

There are two main cases:

Dot-dot-dot (...)

When this modifier is applied to ..., there is one other useful technique which solves the problem of creating a new variable with a name supplied by the user. Use the interpolation syntax from the glue package: "{var}" := expression. (Note the use of := instead of = to enable this syntax).

var_name <- "l100km"
mtcars %>% mutate("{var_name}" := 235 / mpg)

Note that ... automatically provides indirection, so you can use it as is (i.e. without embracing) inside a function:

grouped_mean <- function(df, var, ...) {
  df %>%
    group_by(...) %>%
    summarise(mean = mean({{ var }}))

dplyr documentation built on June 19, 2021, 1:07 a.m.