dplyr
is inconsistent as to which column is selected unless one uses
extra notation such as !!
, {{}}
, .data[[]]
, and so on. Of course
if using a name or string directly are not the “correct” notation, why
are they allowed? Notice how different columns are selected in each
example, depending on the columns present in the data.frame
. The issue
is dplyr
does not commit to an unambiguous interpretation of the basic
notation (only the more complicated, longer notations have reliable
semantics).
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
y = "x"
data.frame(x = 1) %>%
select(y)
## x
## 1 1
data.frame(x = 1, y = 2) %>%
select(y)
## y
## 1 2
dplyr
notations that are unambiguous include:
data.frame(x = 1) %>%
select({{y}})
## x
## 1 1
data.frame(x = 1, y = 2) %>%
select
## data frame with 0 columns and 1 row
data.frame(x = 1) %>%
select(!!y)
## x
## 1 1
data.frame(x = 1, y = 2) %>%
select(!!y)
## x
## 1 1
data.frame(x = 1) %>%
select(!!rlang::enquo(y))
## x
## 1 1
data.frame(x = 1, y = 2) %>%
select(!!rlang::enquo(y))
## x
## 1 1
data.frame(x = 1) %>%
select(.data[[y]])
## x
## 1 1
data.frame(x = 1, y = 2) %>%
select(.data[[y]])
## x
## 1 1
But other notations don’t work (.data
is apparently a mapping from
column names to column indices, and not in fact a reference to the
incoming data.frame
).
data.frame(x = 1) %>%
select(.data[y])
## `.data[y]` must evaluate to column positions or names, not a list
data.frame(x = 1, y = 2) %>%
select(.data[y])
## `.data[y]` must evaluate to column positions or names, not a list
R
itself does not have this problem. Notice how the column named by
y
(which turns out to be x
) is reliably chosen in all cases. In []
and [[]]
notations columns are always values (not taken from code or
variable names; and $
always take from code and not from values).
y = "x"
data.frame(x = 1)[y]
## x
## 1 1
data.frame(x = 1, y = 2)[y]
## x
## 1 1
rqdatable
also has reliable column selection semantics, columns are
always values (not taken from code or variable names).
library("rqdatatable")
## Loading required package: rquery
y = "x"
data.frame(x = 1) %.>%
select_columns(., y)
## x
## 1: 1
data.frame(x = 1, y = 2) %.>%
select_columns(., y)
## x
## 1: 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.