knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) library(dplyr)
This document describes how to add a new SQL backend to dplyr. To begin:
Ensure that you have a DBI compliant database backend. If not, you'll need
to first create it by following the instructions in
vignette("backend", package = "DBI")
.
You'll need a working knowledge of S3. Make sure that you're familiar with the basics before you start.
This document is still a work in progress, but it will hopefully get you started. If you're familiar with how your database and at least one other database that dplyr supports already, this should be reasonably simple task. However, it is possible that a new database backend may need new methods - I'm happy to add those as needed.
Start by creating a new src function to represent the backend. Assuming we're going to create a src for postgres, you'd call it src_postgres()
, and you'd follow the pattern of an existing src. A simplified version of src_postgres()
is show below:
src_postgres <- function(dbname = NULL, host = NULL, port = NULL, user = NULL, password = NULL, ...) { con <- dbConnect(PostgreSQL(), host = host %||% "", dbname = dbname %||% "", user = user, password = password %||% "", port = port %||% "", ...) src_sql("postgres", con) }
Use src_sql()
to create a new S3 object with the correct structure. It must have a DBI connection, but it can store anything else that might be useful.
Next, implement a method for src_desc()
that briefly describes the source:
#' @export src_desc.src_postgres <- function(con) { info <- dbGetInfo(con) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") }
If you read the source code for the real src_postgres()
you'll notice that it caches the getGetInfo()
field on creation, since this saves a little time when printing tbls.
Before continuing, check that you can create a connection to a local database, and that you get a listing of the existing tables. If you have a problem at this point, you may need to check the DBI backend.
Next implement the tbl()
method for your data source. This will probably just be:
tbl.src_mssql <- function(src, from, ...) { tbl_sql("mssql", src = src, from = from, ...) }
Before continuing, make sure you can connect to an existing table, and that the results are printed reasonably. If not, that may indicate your database uses a non-standard DBI interface, and you'll need to fix that before continuing.
This is also a good time implement explain()
, by adding a method for db_explain()
.
If your database uses non-standard quoting (i.e. something other than "
for identifiers and '
for strings), implement methods for sql_escape_string()
and sql_escape_ident()
.
You may need to implement db_query_fields()
, which should return a character vector giving the field names of a query.
At this point, all the basic verbs (summarise()
, filter()
, arrange()
, mutate()
etc) should also work, but it's hard to test without some data.
copy_to()
Next, implement the methods that power copy_to()
work. Once you've implemented these methods, you'll be able copy datasets from R into your database, which will make testing much easier.
db_data_type()
sql_begin()
, sql_commit()
, sql_rollback()
sql_create_table()
, sql_insert_into()
, sql_drop_table()
sql_create_index()
, sql_analyze()
If the database doesn't support a function, just return TRUE
without doing anything. If you find these methods a very poor match to your backend, you may find it easier to provide a direct copy_to()
method.
At this point, you should be able to copy the nycflights13 data packages into your database with (e.g.):
copy_nycflights13(src_mssql(...)) copy_lahman(src_mssql(...))
Don't proceed further until this works, and you've verified that the basic single table verbs word.
Next, check that collapse()
, compute()
, and collect()
work.
If collapse()
fails, your database has a non-standard way of constructing
subqueries. Add a method for sql_subquery()
.
If compute()
fails, your database has a non-standard way of saving queries
in temporary tables. Add a method for db_save_query()
.
Next check the multitable verbs:
left_join()
, inner_join()
: powered by sql_join()
semi_join()
, anti_join()
: powered by sql_semi_join()
union()
, intersect()
, setdiff()
: powered by sql_set_op()
To finish off, you can add custom R -> SQL translation by providing a method for src_translate_env()
. This function should return an object created by sql_variant()
. See existing methods for examples.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.