CDMConnector and dbplyr

library(CDMConnector)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = file.path(tempdir(), "eunomia"))
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
if (!eunomiaIsAvailable()) downloadEunomiaData()

knitr::opts_chunk$set(
  collapse = TRUE,
  eval = rlang::is_installed("duckdb"),
  comment = "#>"
)

Set up

First let's load the required packages for the code in this vignette. If you haven't already installed them, all the other packages can be installed using ´install.packages()´

library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)

Creating the cdm reference

Now let´s connect to a duckdb database with the Eunomia data (https://github.com/OHDSI/Eunomia).

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(con, cdmName = "eunomia", cdmSchema = "main", writeSchema = "main")
cdm

This cdm object is now what we´ll use going forward. It provides a reference to the OMOP CDM tables. We can see that these tables are still in the database, but now we have a reference to each of the ones we might want to use in our analysis. For example, the person table can be referenced like so

Putting it all together

Say we want to make a histogram of year of birth in the person table. We can select that variable, bring it into memory, and then use ggplot to make the histogram.

cdm$person %>%
  select(year_of_birth) %>%
  collect() %>%
  ggplot(aes(x = year_of_birth)) +
  geom_histogram(bins = 30)

If we wanted to make a boxplot for length of observation periods we could do the computation on the database side, bring in the new variable into memory, and use ggplot to produce the boxplot

cdm$observation_period %>%
  select(observation_period_start_date, observation_period_end_date) %>%
  mutate(observation_period = (observation_period_end_date - observation_period_start_date)/365, 25) %>%
  select(observation_period) %>%
  collect() %>%
  ggplot(aes(x = observation_period)) +
  geom_boxplot()

Behind the scenes

We use show_query to check the sql that is being run against duckdb

cdm$person %>%
  tally() %>%
  show_query()
cdm$person %>%
  summarise(median(year_of_birth))%>%
  show_query()
cdm$person %>%
  mutate(gender = case_when(
    gender_concept_id == "8507" ~ "Male",
    gender_concept_id == "8532" ~ "Female",
    TRUE ~ NA_character_))%>%
  show_query()
DBI::dbDisconnect(con, shutdown = TRUE)


Try the CDMConnector package in your browser

Any scripts or data that you put into this service are public.

CDMConnector documentation built on April 4, 2025, 4:42 a.m.