Using DatabaseConnector through DBI and dbplyr

library(DatabaseConnector)

Introduction

This vignette describes how to use the DatabaseConnector package through the DBI and dbplyr interfaces. It assumes you already know how to create a connection as described in the 'Connecting to a database' vignette.

All functions of the DatabaseConnector DBI interface apply SQL translation, thus making it an interface to a virtual database platform speaking OHDSISql as defined in SqlRender.

Connecting

We can use the dbConnect() function, which is equivalent to the connect() function:

connection <- dbConnect(
  DatabaseConnectorDriver(), 
  dbms = "postgresql",
  server = "localhost/postgres",
  user = "joe",
  password = "secret"
)
writeLines("Connecting using PostgreSQL driver")
dbIsValid(conn)
TRUE

Querying

Querying and executing SQL can be done through the usual DBI functions. SQL statements are assumed to be written in 'OhdsiSql', a subset of SQL Server SQL (see the SqlRender package for details), and are automatically translated to the appropriate SQL dialect. For example:

dbGetQuery(connection, "SELECT TOP 3 * FROM cdmv5.person")
data.frame(person_id = c(1,2,3), gender_concept_id = c(8507, 8507, 8507), year_of_birth = c(1975, 1976, 1977))

Or:

res <- dbSendQuery(connection, "SELECT TOP 3 * FROM cdmv5.person")
dbFetch(res)
data.frame(person_id = c(1,2,3), gender_concept_id = c(8507, 8507, 8507), year_of_birth = c(1975, 1976, 1977))
dbHasCompleted(res)
TRUE
dbClearResult(res)
dbDisconnect(res)

Using dbplyr

We can create a table based on a DatabaseConnector connection. The inDatabaseSchema() function allows us to use standard databaseSchema notation promoted by SqlRender:

library(dpylr)
person <- tbl(connection, inDatabaseSchema("cdmv5", "person"))
person
data.frame(person_id = c(1,2,3), gender_concept_id = c(8507, 8507, 8507), year_of_birth = c(1975, 1976, 1977))

we can apply the usual dplyr syntax:

person %>%
  filter(gender_concept_id == 8507) %>%
  count() %>%
  pull()
1234

Date functions

The dbplyr package does not support date functions, but DatabaseConnector provides the dateDiff(), dateAdd(), eoMonth(), dateFromParts(), year(), month(), and day() functions that will correctly translate to various data platforms:

observationPeriod <- tbl(connection, inDatabaseSchema("cdmv5", "observation_period"))
observationPeriod %>%
  filter(
    dateDiff("day", observation_period_start_date, observation_period_end_date) > 365
  ) %>%
  count() %>%
  pull()
987

Allowed table and field names in dbplyr

Because of the many idiosyncrasies in how different dataplatforms store and transform table and field names, it is currently not possible to use any names that would require quotes. So for example the names person, person_id, and observation_period are fine, but Person ID and Obs. Period are not. In general, it is highly recommend to use lower case snake_case for database table and field names.

Temp tables

The DBI interface uses temp table emulation on those platforms that do not support real temp tables. This does require that for those platforms the user specify a tempEmulationSchema, preferably using

option(sqlRenderTempEmulationSchema = "a_schema")

Where "a_schema" refers to a schema where the user has write access. If we know we will need temp tables, we can use the assertTempEmulationSchemaSet() to verify this option has been set. This function will throw an error if it is not set, but only if the provided dbms is a platform that requires temp table emulation.

In OHDSISql, temp tables are referred to using a '#' prefix. For example:

dbWriteTable(connection, "#temp", cars)
message("Inserting data took 0.053 secs")

Temp tables in dbplyr

The copy_to function creates a temp table:

carsTable <- copy_to(connection, cars)
writeLines("Created a temporary table named #cars")

The compute() function also creates a temp table, for example:

tempTable <- person %>%
  filter(gender_concept_id == 8507) %>%
  compute()
message("Created a temporary table named #dbplyr_001")

Cleaning up emulated temp tables

Emulated temp tables are not really temporary, and therefore have to be removed when no longer needed. A convenient way to drop all emulated temp tables created so far in an R session is using the dropEmulatedTempTables() function:

dropEmulatedTempTables(connection)

In our example, this does not do anything because were using a PostgreSQL server, which does natively support temp tables.

Closing the connection

We can use the dbDisconnect() function, which is equivalent to the disconnect() function:

dbDisconnect(connection)


Try the DatabaseConnector package in your browser

Any scripts or data that you put into this service are public.

DatabaseConnector documentation built on May 29, 2024, 6:14 a.m.