rquery: Relational Query Generator for Data Manipulation at Scale

column selection

dplyr is inconsistent as to which column is selected unless one uses extra notation such as !!, {{}}, .data[[]], and so on. Of course if using a name or string directly are not the “correct” notation, why are they allowed? Notice how different columns are selected in each example, depending on the columns present in the data.frame. The issue is dplyr does not commit to an unambiguous interpretation of the basic notation (only the more complicated, longer notations have reliable semantics).

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

y = "x"

data.frame(x = 1) %>%
  select(y)

##   x
## 1 1

data.frame(x = 1, y = 2) %>%
  select(y)

##   y
## 1 2

dplyr notations that are unambiguous include:

data.frame(x = 1) %>%
  select({{y}})

##   x
## 1 1

data.frame(x = 1, y = 2) %>%
  select

## data frame with 0 columns and 1 row

data.frame(x = 1) %>%
  select(!!y)

##   x
## 1 1

data.frame(x = 1, y = 2) %>%
  select(!!y)

##   x
## 1 1

data.frame(x = 1) %>%
  select(!!rlang::enquo(y))

##   x
## 1 1

data.frame(x = 1, y = 2) %>%
  select(!!rlang::enquo(y))

##   x
## 1 1

data.frame(x = 1) %>%
  select(.data[[y]])

##   x
## 1 1

data.frame(x = 1, y = 2) %>%
  select(.data[[y]])

##   x
## 1 1

But other notations don’t work (.data is apparently a mapping from column names to column indices, and not in fact a reference to the incoming data.frame).

data.frame(x = 1) %>%
  select(.data[y])

## `.data[y]` must evaluate to column positions or names, not a list

data.frame(x = 1, y = 2) %>%
  select(.data[y])

## `.data[y]` must evaluate to column positions or names, not a list

R itself does not have this problem. Notice how the column named by y (which turns out to be x) is reliably chosen in all cases. In [] and [[]] notations columns are always values (not taken from code or variable names; and $ always take from code and not from values).

y = "x"

data.frame(x = 1)[y]

##   x
## 1 1

data.frame(x = 1, y = 2)[y]

##   x
## 1 1

rqdatable also has reliable column selection semantics, columns are always values (not taken from code or variable names).

library("rqdatatable")

## Loading required package: rquery

y = "x"

data.frame(x = 1) %.>% 
  select_columns(., y)

##    x
## 1: 1

data.frame(x = 1, y = 2) %.>% 
  select_columns(., y)

##    x
## 1: 1

WinVector/rquery documentation built on Aug. 24, 2023, 11:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WinVector/rquery
Relational Query Generator for Data Manipulation at Scale

extras/small_examples/column_selection.md
In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale

column selection

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/rquery Relational Query Generator for Data Manipulation at Scale

extras/small_examples/column_selection.md In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale

column selection

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/rquery
Relational Query Generator for Data Manipulation at Scale

extras/small_examples/column_selection.md
In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale