rquery: Relational Query Generator for Data Manipulation at Scale

Ad Hoc Queries

John Mount, Win-Vector LLC 2018-07-06

Database Operator Pipelines

rquery's primary workflow is building re-usable database operator pipelines.

Let's try an example. First let's set up our example database and data.

library("rquery")
db = DBI::dbConnect(RSQLite::SQLite(), 
                    ":memory:")
RSQLite::initExtension(db)

DBI::dbWriteTable(db,
                  'd',
                  data.frame(AUC = 0.6, 
                             R2 = c(0.1, 0.2), 
                             D = NA, z = 2),
                  overwrite = TRUE,
                  temporary = TRUE)
d <- db_td(db, 'd')
print(d)

## [1] "table('`d`'; AUC, R2, D, z)"

DBI::dbGetQuery(db, to_sql(d, db))

##   AUC  R2  D z
## 1 0.6 0.1 NA 2
## 2 0.6 0.2 NA 2

Now we can define a query over this table.

q <- d %.>%
  select_rows_nse(., R2 > 0.14) %.>%
  extend_nse(., c = sqrt(R2)) %.>%
  select_columns(., c("AUC", "R2", "c"))

The idea is:

The variable d is a table model (name of the table and a set of assumed column names) that allows us to reason about an actual database table to specified later.
The query q is a sequence of operators we can hold, examine, and alter.

We can print the query/operator pipeline:

cat(format(q))

table('`d`'; 
  AUC,
  R2,
  D,
  z) %.>%
 select_rows(.,
   R2 > 0.14) %.>%
 extend(.,
  c := sqrt(R2)) %.>%
 select_columns(.,
   AUC, R2, c)

And we can ask questions of it:

column_names(q)

## [1] "AUC" "R2"  "c"

tables_used(q)

## [1] "d"

columns_used(q)

## $d
## [1] "AUC" "R2"

And we can convert the operator pipeline to SQL which can then be applied to an actual database table.

sql <- to_sql(q, db)
cat(sql)

SELECT
 `AUC`,
 `R2`,
 `c`
FROM (
 SELECT
  `AUC`,
  `R2`,
  sqrt ( `R2` )  AS `c`
 FROM (
  SELECT * FROM (
   SELECT
    `AUC`,
    `R2`
   FROM
    `d`
  ) tsql_15389734199373316826_0000000000
  WHERE `R2` > 0.14
  ) tsql_15389734199373316826_0000000001
) tsql_15389734199373316826_0000000002

DBI::dbGetQuery(db, sql) %.>%
  knitr::kable(.)

| AUC| R2| c| |----:|----:|----------:| | 0.6| 0.2| 0.4472136|

Ad Hoc mode

rquery also has an "Ad Hoc" mode for interactive analysis. In this mode things are sped up in that the use can work with in-memory tables and also skip the table modeling step.

Let's first set the global option rquery.rquery_db_executor to our database handle so the ad hoc mode knows which database to use to implement the analyses.

old_o <- options(list("rquery.rquery_db_executor" = list(db = db)))

We can now run operators directly on in-memory data.frames.

dL <- data.frame(AUC = 0.6, 
                 R2 = c(0.1, 0.2), 
                 D = NA, z = 2)

# use data frame to define the pipeline, captures only column structure
ops <- dL %.>%
  select_rows_nse(., R2 > 0.14)

# apply pipeline to any data frame with similar column structure
dL %.>% 
  ops %.>% 
  knitr::kable(.)

| AUC| R2| D| z| |----:|----:|----:|----:| | 0.6| 0.2| NA| 2|

ops <- dL %.>%
  select_rows_nse(., R2 > 0.14) %.>%
  extend_nse(., c = sqrt(R2))  %.>%
  select_columns(., c("AUC", "R2", "c")) 

dL %.>% 
  ops %.>% 
  knitr::kable(.)

| AUC| R2| c| |----:|----:|----------:| | 0.6| 0.2| 0.4472136|

dL %.>% 
  select_rows_nse(., R2 > 0.14) %.>%
  extend_nse(., c = sqrt(R2))  %.>%
  select_columns(., c("AUC", "R2", "c")) %.>% 
  knitr::kable(.)

| AUC| R2| c| |----:|----:|----------:| | 0.6| 0.2| 0.4472136|

Cleanup

options(old_o)
DBI::dbDisconnect(db)

WinVector/rquery documentation built on Aug. 24, 2023, 11:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WinVector/rquery
Relational Query Generator for Data Manipulation at Scale

extras/AdHocQueries.md
In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale

Ad Hoc Queries

Database Operator Pipelines

Ad Hoc mode

Cleanup

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/rquery Relational Query Generator for Data Manipulation at Scale

extras/AdHocQueries.md In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale

Ad Hoc Queries

Database Operator Pipelines

Ad Hoc mode

Cleanup

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/rquery
Relational Query Generator for Data Manipulation at Scale

extras/AdHocQueries.md
In WinVector/rquery: Relational Query Generator for Data Manipulation at Scale