Ambiguity between columns and external variables

With selecting functions like dplyr::select() or tidyr::pivot_longer(), you can refer to variables by name:

mtcars %>% select(cyl, am, vs)

mtcars %>% select(mpg:disp)

For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a warning informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.

vars <- c("cyl", "am", "vs")
result <- mtcars %>% select(vars)

We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.

some_df <- mtcars[1:4, ]
some_df$vars <- 1:nrow(some_df)

These are very different objects but it isn't a problem if the context forces you to be specific about where to find vars:

vars

some_df$vars

In a selection context however, the column wins:

some_df %>% select(vars)

Fixing the ambiguity

To make your selection code more robust and silence the message, use all_of() to force the external vector:

some_df %>% select(all_of(vars))

For more information or if you have comments about this, please see the Github issue tracking the deprecation process.



tidyverse/tidyselect documentation built on March 14, 2024, 3:16 p.m.