rquery
1.3.0
now incorporates a general SQL
tree-rewriting capability. This is to help adapt rquery
to different databases.
Let's work an example using rqdatatable
(rquery
in memory using data.table
), RPostgreSQL
, and RSQLite
.
First lets confirm we have the needed packages installed and attach them.
library("rquery") have_rqdatatable <- requireNamespace("rqdatatable", quietly = TRUE) if(have_rqdatatable) { library("rqdatatable") } have_postgresql <- requireNamespace("DBI", quietly = TRUE) && requireNamespace("RPostgreSQL", quietly = TRUE) if(have_postgresql) { library("RPostgreSQL") } have_rsqlite <- requireNamespace("DBI", quietly = TRUE) && requireNamespace("RSQLite", quietly = TRUE) if(have_rsqlite) { library("RSQLite") }
Now let's set up a simple data example.
d <- data.frame(x = -3:3) knitr::kable(d)
We define the rquery
pipe line to compute x mod 2
.
ops1 <- local_td(d) %.>% extend(., x_mod_2 = x %% 2) cat(format(ops1))
And rqdatatable
can execute the pipeline against any table that has the required numeric x
column.
d %.>% ops1 %.>% knitr::kable(.)
In principle applying the same operations to a database table is easy. However SQL
mod
unfortunately returns the sign of the argument in its result. This means its range of x mod 2
can be from -1
through 1
instead of the more typical 0
through 1
(it breaks much of the mathematical purpose of modulo if we don't have mod(a-b, k)==0
equivalent to mod(a)-mod(b)==0
; dplyr
runs into the same issue).
To work around this we define a larger pipeline that corrects the sign after the modulo operation.
ops <- local_td(d) %.>% extend(., x_mod_2 = x %% 2) %.>% extend(., # extra step to work around SQL mod # returns sign of argument. x_mod_2 = ifelse(x_mod_2 >=0, x_mod_2, x_mod_2 + 2)) cat(format(ops))
This pipeline still works in memory.
d %.>% ops %.>% knitr::kable(.)
And it works correctly on PostgreSQL
database tables.
postgresql_connection <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), host = 'localhost', port = 5432, user = 'johnmount', password = '') postgresql_db <- rquery_db_info( connection = postgresql_connection, is_dbi = TRUE, connection_options = rq_connection_tests(postgresql_connection)) postgresql_db d_postgresql <- rq_copy_to(postgresql_db, "d", d, temporary = TRUE, overwrite = TRUE) d_postgresql rstr(postgresql_db, d_postgresql) cat(to_sql(ops, postgresql_db)) ops %.>% postgresql_db %.>% knitr::kable(.) DBI::dbDisconnect(postgresql_connection)
However SQLite
needs mod(a,b)
written as a % b
. So we need to re-write our SQL
to comply with SQLite
's variation when talking to SQLite
. In rquery
this is easy: we define a tree-node re-write rule and rquery
's SQL
translator lets this user code visit every node of a SQL
parse tree during the translation.
The use code looks like this:
tree_rewriter <- function(x, db_info) { if(("pre_sql_sub_expr" %in% class(x)) && (length(x$info$name) == 1) && (x$info$name == "modulo")) { lhs <- x$toks[[3]] rhs <- x$toks[[5]] return(pre_sql_sub_expr( list(pre_sql_token("("), lhs, pre_sql_token("%"), rhs, pre_sql_token(")")), info=list(name = "user_replaced")) ) } x }
And using the re-writer is as easy as attaching it to our database handle, which we show below.
rsqlite_connection <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") rsqlite_db <- rquery_db_info( connection = rsqlite_connection, is_dbi = TRUE, connection_options = rq_connection_tests(rsqlite_connection)) # attach our tree-rewriter to the databse handle. # this handle now uses this re-writer. rsqlite_db$tree_rewriter <- tree_rewriter rsqlite_db d_rsqlite <- rq_copy_to(rsqlite_db, "d", d, temporary = TRUE, overwrite = TRUE) d_rsqlite rstr(rsqlite_db, d_rsqlite) cat(to_sql(ops, rsqlite_db)) ops %.>% rsqlite_db %.>% knitr::kable(.)
We learn how to write the transform by using the function str_pre_sql_sub_expr()
which prints out node details, telling us where to get the arguments for our new SQL
code. We can dump all the nodes the SQL
translator visits (to see what pattern we would have to re-write) as follows. The "pre_sql_sub_expr" %in% class(x)
is merely restricting the printing to non-leaf nodes (the ones likely to hold the references to the values we need).
rsqlite_db$tree_rewriter <- function(x, db_info) { if("pre_sql_sub_expr" %in% class(x)) { print(str_pre_sql_sub_expr(x)) } x } untranslated_sql <- to_sql(ops, rsqlite_db)
From the above we see that the node named "modulo" (name found after the first colon) is the one we need to re-write, and that it has the values we need as terms 3
and 5
(where we find the values x
and 2
).
In addition to expression-tree re-writing procedures rquery
also allows a per-handle replacement of any to_sql()
method via the db_methods
map found on the rquery_db_info
database handle. Yet another form of adaption can be found here, where we by-hand replace a window function with a join when using MonetDBLite
(as while MonetDBLite
defines window functions, it doesn't define a windowed version of sum()
; also check out the operator diagram showing the two uses of the same table arriving at a join).
And that is a brief introduction to adapting rquery
to new databases.
DBI::dbDisconnect(rsqlite_connection)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.