README.md

SQLusine

A SQL Query Factory for R

SQLusine is designed to be a flexible SQL query generator for R. It has the following design goals:

To this end, SQLusine:

SQLusine is in early alpha and can only be installed from github. The API is subject to constant change and no guarantees are made in usability, accuracy, performance, or stability.

Prior Art

Why SQLusine?

dplyr and rquery are both focused on the ability to do actual data manipulations, and to do so, they had to each build a their own SQL render factory. Package authors who wish to build systems similar to dplyr or rquery must write their own SQL generation factory in addition to a user-facing API.

The SQL generation factories for dplyr and rquery are tied very closely to the user-facing API, and as such it is impossible to build off of just those parts of each package. It is my hope that having a package designed just for SQL generation will allow other package authors to build dplyr and rquery like APIs more easily.

SQLusine only builds SQL--it doesn't care if column names are accurate or the referenced tables exist. It only attempts to build valid SQL statements, and leaves the rest to the user. This means SQLusine can focus on generating efficient SQL and the users can focus on the application.

example use:

library(SQLusine)
sub_qry <- SELECT(
  from = 'tbl_whatever',
  what = c('k_tbl_whatever', 'name', 'userID', J('department', 'dept_code')),
  inner_join = 'tbl_department',
  on = 'dept_code',
)


qry <- SELECT_DISTINCT(
  from = 'tbl_staff',
  what = c('is_manager', J('name', 'userID', 'department')),
  left_join = sub_qry,
  on = c('dept_code' = J('dept_code'))
)


cat(render_query(sub_qry, conn = DBI::ANSI()))
cat(render_query(qry, conn = DBI::ANSI()))

Optimizers:

Strip unused columns: optim_strip_unused_columns

Iterates through the query tree and removes columns from what that are not needed for the final query.

Example:

SELECT t1.col1 
FROM (SELECT * FROM tbl1) as t1
INNER JOIN (SELECT * FROM tbl2) as t2 ON
 t1.col2 = t2.col2

becomes

SELECT t1.col1
FROM (SELECT col1, col2 FROM tbl1) AS t1
INNER JOIN (SELECT col2 from tbl2) AS t2 ON
  t1.col2 = t2.col2

Refactor as CTE: optim_refactor_as_cte

Attempts to detect subqueries that are used more than once in the query and moves them into a common table expressions (if supported by the database).



rkingdc/SQLusine documentation built on June 2, 2019, 2:44 p.m.