Database Operator Pipelines

rquery's primary workflow is building re-usable database operator pipelines.

Let's try an example. First let's set up our example database and data.

library("rquery")
db = DBI::dbConnect(RSQLite::SQLite(), 
                    ":memory:")
RSQLite::initExtension(db)

DBI::dbWriteTable(db,
                  'd',
                  data.frame(AUC = 0.6, 
                             R2 = c(0.1, 0.2), 
                             D = NA, z = 2),
                  overwrite = TRUE,
                  temporary = TRUE)
d <- db_td(db, 'd')
print(d)

DBI::dbGetQuery(db, to_sql(d, db))

Now we can define a query over this table.

q <- d %.>%
  select_rows_nse(., R2 > 0.14) %.>%
  extend_nse(., c = sqrt(R2)) %.>%
  select_columns(., c("AUC", "R2", "c"))

The idea is:

We can print the query/operator pipeline:

cat(format(q))

And we can ask questions of it:

column_names(q)

tables_used(q)

columns_used(q)

And we can convert the operator pipeline to SQL which can then be applied to an actual database table.

sql <- to_sql(q, db)
cat(sql)
DBI::dbGetQuery(db, sql) %.>%
  knitr::kable(.)

Ad Hoc mode

rquery also has an "Ad Hoc" mode for interactive analysis.
In this mode things are sped up in that the use can work with in-memory tables and also skip the table modeling step.

Let's first set the global option rquery.rquery_db_executor to our database handle so the ad hoc mode knows which database to use to implement the analyses.

old_o <- options(list("rquery.rquery_db_executor" = list(db = db)))

We can now run operators directly on in-memory data.frames.

dL <- data.frame(AUC = 0.6, 
                 R2 = c(0.1, 0.2), 
                 D = NA, z = 2)

# use data frame to define the pipeline, captures only column structure
ops <- dL %.>%
  select_rows_nse(., R2 > 0.14)

# apply pipeline to any data frame with similar column structure
dL %.>% 
  ops %.>% 
  knitr::kable(.)

ops <- dL %.>%
  select_rows_nse(., R2 > 0.14) %.>%
  extend_nse(., c = sqrt(R2))  %.>%
  select_columns(., c("AUC", "R2", "c")) 

dL %.>% 
  ops %.>% 
  knitr::kable(.)


dL %.>% 
  select_rows_nse(., R2 > 0.14) %.>%
  extend_nse(., c = sqrt(R2))  %.>%
  select_columns(., c("AUC", "R2", "c")) %.>% 
  knitr::kable(.)

Cleanup

options(old_o)
DBI::dbDisconnect(db)


WinVector/rquery documentation built on Aug. 24, 2023, 11:12 a.m.