Let's take a quick look at what we are calling query narrowing. For our example let's set up a database connection and copy a small table into the database.

db <- DBI::dbConnect(RSQLite::SQLite(),
                         ":memory:")

td <- rquery::rq_copy_to(db, 
                         "d", 
                         data.frame(a = 1:3, b = 4:6, c = 7:9),
                         temporary = TRUE,
                         overwrite = TRUE)

print(td)

rquery::rstr(db, td$table_name)

For our first example we will user rquery to generate some SQL.

library("rquery")
library("wrapr")

op1 <- td %.>% 
  extend_nse(., e := a + 1)
cat(to_sql(op1, db))

Notice the above SQL has a trivial extra inner select step. rquery reserves this SQL for extra effects such as query narrowing and it is presumed that such selects are easily removed by downstream query optimizers. The way rquery uses this stage is shown as follows. Suppose we later declare we are only going to use the new column "e" as our our result.

op2 <- op1 %.>% 
  select_columns(., "e")

cat(to_sql(op2, db))

db %.>% op2

rquery propagated the columns used all the way to the inner query. This makes the data processing thinner and in fact often faster as even with "lazy evaluation" there is significant cost associated with processing the additional columns (and this is not always eliminated by the query optimizers). The narrowing effect can be critical if one caches or stores an intermediate result. rquery did introduce some trivial outer SQL to represent the outer select step, but we again assume this is the sort of thing that is easy for query optimizers to remove.

In contrast dplyr does not back-propagate later constraints to earlier in the query. Notice below how the inner query requests many unused columns.

library("dplyr")
packageVersion("dplyr")
packageVersion("dbplyr")

hdl <- dplyr::tbl(db, "d")

hdl %>%
  mutate(., e = a + 1) %>%
  dbplyr::remote_query(.)

hdl %>%
  mutate(., e = a + 1) %>%
  select(., e) %>%
  dbplyr::remote_query(.)

Notice dplyr/dbplyr does not propagate the column narrowing back to earlier expressions.

DBI::dbDisconnect(db)


WinVector/rquery documentation built on Aug. 24, 2023, 11:12 a.m.