Ambiguity between columns and external variables

With selecting functions like dplyr::select() or tidyr::pivot_longer(), you can refer to variables by name:

mtcars %>% select(cyl, am, vs)

mtcars %>% select(mpg:disp)

For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a note informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.

vars <- c("cyl", "am", "vs")
result <- mtcars %>% select(vars)

This note will become a warning in the future, and then an error. We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.

some_df <- mtcars
some_df$vars <- 1:nrow(mtcars)

These are very different objects but it isn't a problem if the context forces you to be specific about where to find vars:

vars

some_df$vars

In a selection context however, the column wins:

some_df %>% select(vars)

Fixing the ambiguity

To make your selection code more robust and silence the message, use all_of() to force the external vector:

some_df %>% select(all_of(vars))

For more information or if you have comments about this, please see the Github issue tracking the deprecation process.



tidyverse/tidyselect documentation built on Jan. 28, 2020, 1:13 a.m.