rquery: Relational Query Generator for Data Manipulation at Scale

NarrowExample

Win-Vector LLC

Let's take a quick look at what we are calling query narrowing. For our example let's set up a database connection and copy a small table into the database.

db <- DBI::dbConnect(RSQLite::SQLite(),
                         ":memory:")

td <- rquery::rq_copy_to(db, 
                         "d", 
                         data.frame(a = 1:3, b = 4:6, c = 7:9),
                         temporary = TRUE,
                         overwrite = TRUE)

print(td)

## [1] "table(`d`; a, b, c)"

rquery::rstr(db, td$table_name)

## table `d` SQLiteConnection 
##  nrow: 3 
## 'data.frame':    3 obs. of  3 variables:
##  $ a: int  1 2 3
##  $ b: int  4 5 6
##  $ c: int  7 8 9

For our first example we will user rquery to generate some SQL.

library("rquery")
library("wrapr")

op1 <- td %.>% 
  extend_nse(., e := a + 1)
cat(to_sql(op1, db))

## SELECT
##  `a`,
##  `b`,
##  `c`,
##  `a` + 1  AS `e`
## FROM (
##  SELECT
##   `a`,
##   `b`,
##   `c`
##  FROM
##   `d`
##  ) tsql_30228218124473308630_0000000000

Notice the above SQL has a trivial extra inner select step. rquery reserves this SQL for extra effects such as query narrowing and it is presumed that such selects are easily removed by downstream query optimizers. The way rquery uses this stage is shown as follows. Suppose we later declare we are only going to use the new column "e" as our our result.

op2 <- op1 %.>% 
  select_columns(., "e")

cat(to_sql(op2, db))

## SELECT
##  `e`
## FROM (
##  SELECT
##   `a` + 1  AS `e`
##  FROM (
##   SELECT
##    `a`
##   FROM
##    `d`
##   ) tsql_16807721345884970846_0000000000
## ) tsql_16807721345884970846_0000000001

db %.>% op2

##   e
## 1 2
## 2 3
## 3 4

rquery propagated the columns used all the way to the inner query. This makes the data processing thinner and in fact often faster as even with "lazy evaluation" there is significant cost associated with processing the additional columns (and this is not always eliminated by the query optimizers). The narrowing effect can be critical if one caches or stores an intermediate result. rquery did introduce some trivial outer SQL to represent the outer select step, but we again assume this is the sort of thing that is easy for query optimizers to remove.

In contrast dplyr does not back-propagate later constraints to earlier in the query. Notice below how the inner query requests many unused columns.

library("dplyr")

## Warning: package 'dplyr' was built under R version 3.5.1

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:wrapr':
## 
##     coalesce

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

packageVersion("dplyr")

## [1] '0.7.6'

packageVersion("dbplyr")

## [1] '1.2.2'

hdl <- dplyr::tbl(db, "d")

hdl %>%
  mutate(., e = a + 1) %>%
  dbplyr::remote_query(.)

## <SQL> SELECT `a`, `b`, `c`, `a` + 1.0 AS `e`
## FROM `d`

hdl %>%
  mutate(., e = a + 1) %>%
  select(., e) %>%
  dbplyr::remote_query(.)

## <SQL> SELECT `e`
## FROM (SELECT `a`, `b`, `c`, `a` + 1.0 AS `e`
## FROM `d`)

Notice dplyr/dbplyr does not propagate the column narrowing back to earlier expressions.

DBI::dbDisconnect(db)

WinVector/rquery documentation built on Aug. 24, 2023, 11:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WinVector/rquery
Relational Query Generator for Data Manipulation at Scale

extras/NarrowExample.md
In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale

NarrowExample

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/rquery Relational Query Generator for Data Manipulation at Scale

extras/NarrowExample.md In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale

NarrowExample

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/rquery
Relational Query Generator for Data Manipulation at Scale

extras/NarrowExample.md
In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale