Ambiguity between columns and external variables

With selecting functions like dplyr::select() or tidyr::pivot_longer(), you can refer to variables by name:

mtcars %>% select(cyl, am, vs)

mtcars %>% select(mpg:disp)

For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a warning informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.

vars <- c("cyl", "am", "vs")
result <- mtcars %>% select(vars)

We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.

some_df <- mtcars[1:4, ]
some_df$vars <- 1:nrow(some_df)

These are very different objects but it isn't a problem if the context forces you to be specific about where to find vars:

vars

some_df$vars

In a selection context however, the column wins:

some_df %>% select(vars)

Fixing the ambiguity

To make your selection code more robust and silence the message, use all_of() to force the external vector:

some_df %>% select(all_of(vars))

For more information or if you have comments about this, please see the Github issue tracking the deprecation process.



Try the tidyselect package in your browser

Any scripts or data that you put into this service are public.

tidyselect documentation built on May 29, 2024, 6:07 a.m.