dplyr is inconsistent as to which column is selected unless one uses extra notation such as !!, {{}}, .data[[]], and so on. Of course if using a name or string directly are not the "correct" notation, why are they allowed? Notice how different columns are selected in each example, depending on the columns present in the data.frame. The issue is dplyr does not commit to an unambiguous interpretation of the basic notation (only the more complicated, longer notations have reliable semantics).

library("dplyr")

y = "x"

data.frame(x = 1) %>%
  select(y)

data.frame(x = 1, y = 2) %>%
  select(y)

dplyr notations that are unambiguous include:

data.frame(x = 1) %>%
  select({{y}})

data.frame(x = 1, y = 2) %>%
  select

data.frame(x = 1) %>%
  select(!!y)

data.frame(x = 1, y = 2) %>%
  select(!!y)

data.frame(x = 1) %>%
  select(!!rlang::enquo(y))

data.frame(x = 1, y = 2) %>%
  select(!!rlang::enquo(y))

data.frame(x = 1) %>%
  select(.data[[y]])

data.frame(x = 1, y = 2) %>%
  select(.data[[y]])

But other notations don't work (.data is apparently a mapping from column names to column indices, and not in fact a reference to the incoming data.frame).

data.frame(x = 1) %>%
  select(.data[y])

data.frame(x = 1, y = 2) %>%
  select(.data[y])

R itself does not have this problem. Notice how the column named by y (which turns out to be x) is reliably chosen in all cases. In [] and [[]] notations columns are always values (not taken from code or variable names; and $ always take from code and not from values).

y = "x"

data.frame(x = 1)[y]

data.frame(x = 1, y = 2)[y]

rqdatable also has reliable column selection semantics, columns are always values (not taken from code or variable names).

library("rqdatatable")

y = "x"

data.frame(x = 1) %.>% 
  select_columns(., y)

data.frame(x = 1, y = 2) %.>% 
  select_columns(., y)


WinVector/rquery documentation built on Aug. 24, 2023, 11:12 a.m.