library(dplyr) library(dbplyr) con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "", host = "127.0.0.1", user = getOption("dittodb.test.user"), password = getOption("dittodb.test.pw") ) DBI::dbSendStatement(con, "CREATE DATABASE travelling") DBI::dbDisconnect(con_psql) con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "travelling", host = "127.0.0.1", user = getOption("dittodb.test.user"), password = getOption("dittodb.test.pw") ) nycflights13_create_sql(con_psql) DBI::dbDisconnect(con_psql) start_db_capturing(path = "vignettes/") con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "travelling", host = "127.0.0.1", user = getOption("dittodb.test.user"), password = getOption("dittodb.test.pw") ) tbl(con_psql, "flights") %>% filter(!is.na(tailnum)) %>% filter(arr_delay >= 180) %>% select(tailnum) %>% distinct() %>% collect() dbDisconnect(con_psql) stop_db_capturing()
library(dittodb) # set the mockPaths for this vignette db_mock_paths("travelling") knitr::opts_chunk$set(eval = TRUE, message = FALSE, warning = FALSE)
The present consists in mocking the connection to a real PostgreSQL server that contains a database version of the {nycflights13} dataset (among other databases). See the {nycflights13} vignette for more information about this database.
This example is for you if you ever wondered how to use scripts that you use at the office when you are at home or travelling. Or how to continue developing these scripts while you don't have an internet connection.
Many of us have to use databases that are only accessible from a local network.
The package {dittodb} provides with_mock_db()
that wraps the code and makes it
possible to run outside the office (or even with no internet access at all!).
Suppose we are asked to analyze the flights to only show flights with planes that have been delayed at least 3 hours.
One would find all the flights that have been delayed by over 3 hours, and then only grab the distinct tail numbers. The only consideration would be to filter those flights with missing tail number or those will be treated as a single plane.
We could run the following code to get that data with a direct connection to the database (i.e. at the office):
library(dplyr) library(dbplyr) con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "travelling", host = "127.0.0.1", user = "m.ciccone" ) tbl(con_psql, "flights") %>% filter(!is.na(tailnum)) %>% filter(arr_delay >= 180) %>% select(tailnum) %>% distinct()
However, this won't work if we can't connect to our database server. And since
postgres.server
is an alias to an IP only accessible from the local network at
our office, we couldn't run this code and get a result elsewhere. But what if we
wanted to continue work on this analysis on the train home?
Important: This example is using phony authentication. Please never write your
passwords in scripts, use your .Rprofile
, an environment variable, or some other
more secure method instead.
One option would be saving a CSV or TXT file of the data manually, and then manually reading it in to our R session. But this has a number of drawbacks: we have to mentally keep track of where each query is from, save it to the right file, read it in to the right place, etc. We also have to maintain a separate system or code path for reading in the saved files. {dittodb} can take care of all of this for us in the background, allowing us to record the results of the necessary queries, and playing them back when those same queries are called without a connection to the database.
While we are able to connect to the database (i.e. when we are at the office) we
can save the results returned by queries with code like the following (by
calling start_db_capturing()
before the connection and the code that executes the
queries and then stop_db_capturing()
at the end):
library(dittodb) start_db_capturing() con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "dittodb", host = "postgres.server", user = "m.ciccone" ) flights_delayed <- tbl(con_psql, "flights") %>% filter(!is.na(tailnum)) %>% filter(arr_delay >= 180) %>% select(tailnum) %>% distinct() %>% collect() flights_delayed dbDisconnect(con_psql) stop_db_capturing()
library(dplyr) library(dbplyr) # this is the same code that is echoed below, but used here to show output that # the chunk above would produce if it were able to connect with_mock_db({ con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "travelling", host = "127.0.0.1", user = "m.ciccone" ) flights_delayed_from_mock <- tbl(con_psql, "flights") %>% filter(!is.na(tailnum)) %>% filter(arr_delay >= 180) %>% select(tailnum) %>% distinct() %>% collect() flights_delayed_from_mock }) # `dbDisconnect` returns TRUE TRUE
If there was a success capturing one or more queries, then we are able to replicate the result connected to a different network or even without internet access:
with_mock_db({ con_psql <- DBI::dbConnect( RPostgreSQL::PostgreSQL(), dbname = "travelling", host = "127.0.0.1", user = "m.ciccone" ) flights_delayed_from_mock <- tbl(con_psql, "flights") %>% filter(!is.na(tailnum)) %>% filter(arr_delay >= 180) %>% select(tailnum) %>% distinct() %>% collect() flights_delayed_from_mock })
One thing to note is that when using dbplyr
, we need to be a bit careful that
we wrap the entire interaction in with the database objects in with_mock_db
if
we are taking advantage of dbplyr
's lazy evaluation (which is by default) and
use collect()
to return the results when you want them recorded. Because
dbplyr
waits until the last possible second to request the data, if you don't
have a collect()
call (or a call the will implicitly send the query) there
won't be a query called, and {dittodb} won't see be able to record the response
from that query.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.