knitr::opts_chunk$set(echo = TRUE)
options(kableExtra.latex.load_packages = FALSE)
library(kableExtra)
#knitr::knit_hooks$set(document = function(x) {sub('\\usepackage[]{color}', '\\usepackage[table]{xcolor}', x, fixed = TRUE)})
library(dplyr)
options(knitr.kable.NA = "")

# pdf2png <- function(path) {
#   # only do the conversion for non-LaTeX output
#   if (knitr::is_latex_output()) {
#     return(path)
#   }
#   path2 <- xfun::with_ext(path, "png")
#   img <- magick::image_read_pdf(path)
#   magick::image_write(img, path2, format = "png")
#   path2
# }

latex_table_font_size <- 8

TODO List {-}

List of Abbreviations

abbreviations <- readr::read_delim(col_names = FALSE, delim = ";", trim_ws = TRUE, file = "
  CDM; Common data model
  DPP4; Dipeptidyl peptidase-4
  GLP1; Glucagon-like peptide-1
  IRB; Institutional review board
  LEGEND; Large-scale Evidence Generation and Evaluation across a Network of Databases
  MACE; Major adverse cardiovascular event
  MDRR; Minimum detectable risk ratio
  OHDSI; Observational Health Data Science and Informatics
  OMOP; Observational Medical Outcomes Partnership
  PS; Propensity score
  RCT; Randomized controlled trial
  SGLT2; Sodium-glucose co-transporter-2
  T2DM; Type 2 diabetes mellitus
")

tab <- kable(abbreviations, col.names = NULL, linesep = "", booktabs = TRUE)

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Getting Started

LEGEND-T2DM consists of several R packages available on https://github.com/ohdsi-studies:

that depend themselves on a large number of packages from Hades (add link). To gracefully handle these dependencies, LEGEND-T2DM uses renv through a version-controlled renv.lock file to specify each package. To install LegendT2dm and all of its dependencies, we first "pull" the lastest source from github:

git clone https://github.com/ohdsi-studies/LegendT2dm

then from within an R or Rstudio terms, we active renv:

install.packages("renv") # Only necessary once
renv::activate()
renv::restore()

after which one can now build and install the final LegendT2dm package.

Exposure Cohort Counts

For the Class-vs-Class Study, we will construct $$ \begin{aligned} & (4 \text{ drug-class targets})\ \times \ & (2 \text{ On-treatment censoring types})\ \times \ & (2 \text{ Prior metformin-use choices}) \ & = 16 \end{aligned} $$ exposure cohort definitions. Likewise, for the Drug-vs-Drug Study, we will construct $$ \begin{aligned} & (22 \text{ drug-ingredient targets})\ \times \ & (2 \text{ On-treatment censoring types})\ \times \ & (2 \text{ Prior metformin-use choices}) \ & = 88 \end{aligned} $$ exposure cohort definitions. For the Heterogeneity Study, we will stratify each of these cohort definitions into $$ \begin{aligned} & 3 \text{ (age: 18-44 / 45-64 /}\ge\text{65) } + \ & 2 \text{ (sex: female / male) } + \ & 1 \text{ (race: Black) } + \ & 2 \text{ (cardiovascular risk: lower / higher) } + \ & 2 \text{ (renal impairment: with/without) } \ & = 10 \end{aligned} $$ additional definitions, results in $(16 + 88) \times 10 = 1040$ further cohort definitions.

Exposure Cohort ID Key

This section documents internal coding used across the three studies to label cohort definitions and their SQL representation. Each cohort carries 9-digit identifier $$ \text{ID}\ [DD][T][M][A][S][R][C][K] $$ where Table \@ref(tab:drug-codes) reports the target 2-digit drug codes $DD$ and Table \@ref(tab:category-codes) lists the codes for time-at-risk $T$, metformin $M$, age $A$, sex $S$, race $R$, cardiovascular $C$ and renal (kidney) $K$ categories.

addTab <- function(name, cohortId) {
  if (cohortId %% 10 != 0) {
    paste0("\\ \\ \\ \\ ", name)
  } else {
    name
  }
}

exposures <- read.csv(system.file("settings/ExposuresOfInterest.csv", package = "LegendT2dm"))

drugCodes <- exposures  %>% arrange(cohortId) %>%
  rowwise() %>%
  mutate(name = addTab(name, cohortId)) %>%
  select(name, cohortId) %>%
  rename(Drug = name, "2-digit Code" = cohortId)

tab <- kable(drugCodes, booktabs = TRUE, linesep = "",
             caption = "2-Digit Drug Codes") %>%
  column_spec(1, width = "10em") %>%
  column_spec(2, width = "30em")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}
categoryCodes <- readr::read_delim(col_names = TRUE, delim = ";", trim_ws = TRUE, file = "
  Category; Value; 1-digit Code
  Time-at-risk;
  ; ITT/OT1; 1
  ; OT2; 2
  Metformin;
  ; prior; 1
  ; none; 2
  Age ;
  ; any; 0
  ; younger; 1
  ; middle; 2
  ; older; 3
  Sex ;
  ; any; 0
  ; female; 1
  ; male; 2
  Race ;
  ; any; 0
  ; Black; 1
  Cardiovascular risk ;
  ; any; 0
  ; low; 1
  ; higher; 2
  Renal disease ;
  ; any; 0
  ; without; 1
  ; with; 2
")

footnote <- c("ITT: intent-to-treat; OT1: on-treatment-1; OT2: on-treatment-2")

tab <- kable(categoryCodes, booktabs = TRUE, linesep = "",
             caption = "Other Cohort Category Codes") %>%
  footnote(general = footnote, general_title = "")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Here are useful regex masks for the different strata

mask <- readr::read_csv(system.file("settings", "masks.csv", package = "LegendT2dm"))

tab <- kable(mask, booktabs = TRUE, linesep = "",
             caption = "Regex masks for strata")
# %>%
  # footnote(general = footnote, general_title = "")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Analysis Plan Matrix

For each comparison within each study, we will execute the 7 CohortMethod analysis plans in Table \@ref(tab:plan-matrix).

design <- readr::read_delim(col_names = TRUE, delim = ";", trim_ws = TRUE, file = "
  ID; PS-covariates; PS-adjustment; Outcome model; Risk-window end
  1; ; ; Cox; target exposure end
  2; FE defaults; variable match; Cox (conditional); target exposure end
  3; FE defaults; 5 strata; Cox (conditional); target exposure end
  4; ; ; Cox; observational period end
  5; FE defaults; variable match; Cox (conditional); observational period end
  6; FE defaults; 5 strata; Cox (conditional); observational period end
  7; ; ; Cox; target exposure end or escalation
  8; FE defaults; variable match; Cox (conditional); target exposure end or escalation
  9; FE defaults; 5 strata; Cox (conditional); target exposure end or escalation
")

footnote <- c("FE: FeatureExtraction package (365 days long-term, 180 days medium-term and 30 days short-term look-back)",
              "PS: Propensity score")

tab <- kable(design, linesep = "", booktabs = TRUE,
             caption = "CohortMethod analysis descriptions") %>%
  footnote(general = footnote, general_title = "")


if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Configuration files

The LEGEND-T2DM studies depend on several configuration files. We directly provide some of these files, while others we generate through executable R scripts before distributing study packages.

Executable R scripts

File dependencies

For all LEGEND-T2DM studies

For class studies

The following files are generated by extra/GenerateExposureCohortDefinitions.R:

Table specification files

The following files specify the diagnostics and results data models on the public Postgresql server:

These files serve as arguments to createDataModelSqlFile to generate

and to uploadResultsToDatabase to upload diagnostics and results to the public Postgresql server.

CohortMethod function arguments


Data models

This document describes the data models for storing the output of the CohortDiagnostics and CohortMethod study artifacts for the LEGEND-T2DM study. Table \@ref(tab:schema) provides the schema names.

schema <- readr::read_delim(col_names = FALSE, delim = ";", trim_ws = TRUE, file = "
  legendt2dm_class_diagnostics ; legendt2dm_class_results
  legendt2dm_drug_diagnostics ; legendt2dm_drug_results
  legendt2dm_outcome_diagnostics
")

tab <- kable(schema, col.names = c("Cohort diagnostics", "Study results"),
             linesep = "", booktabs = TRUE,
             caption = "LEGEND-T2DM public study schema")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped",
                        font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Fields with minimum values

Some fields contain patient counts or fractions that are easily converted to patient counts. To prevent identifiability, these fields are subject to a minimum value. When the value falls below this minimum, it is replaced with the negative value of the minimum. For example, if the minimum subject count is 5, and the actual count is 2, the value stored in the data model will be -5, which could be represented as '\<5' to the user. Note that the value 0 is permissible, as it identifies no persons.

Cohort diagnostics data model

specifications <- readr::read_csv(system.file("/settings/resultsDataModelSpecification.csv", package = "CohortDiagnostics")) %>%
  filter(optional == "No")

tables <- split(specifications, specifications$tableName)

for (table in tables) {
  header <- sprintf("### Table %s", table$tableName[1])

  table <- table %>%
    select(Field = .data$fieldName, Type = .data$type, Key = .data$primaryKey
           # , Description = .data$description
           ) %>%
    kable(linesep = "", booktabs = TRUE, longtable = TRUE)

  if (knitr::is_latex_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(latex_options = "striped", font_size = latex_table_font_size) %>%
                 column_spec(1, width = "10em") %>%
                 column_spec(2, width = "5em") %>%
                 column_spec(3, width = "3em") %>%
                 column_spec(4, width = "16em"))
  } else if (knitr::is_html_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(bootstrap_options = "striped"))
  }
}

PS assessment data model

specifications <- readr::read_csv(system.file("settings/PsAssessmentModelSpecs.csv", package = "LegendT2dm"))

tables <- split(specifications, specifications$tableName)

for (table in tables) {
  header <- sprintf("### Table %s", table$tableName[1])

  table <- table %>%
    select(Field = .data$fieldName, Type = .data$type, Key = .data$primaryKey,
           Description = .data$description
           ) %>%
    kable(linesep = "", booktabs = TRUE, longtable = TRUE)

  if (knitr::is_latex_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(latex_options = "striped", font_size = latex_table_font_size) %>%
                 column_spec(1, width = "10em") %>%
                 column_spec(2, width = "5em") %>%
                 column_spec(3, width = "3em") %>%
                 column_spec(4, width = "16em"))
  } else if (knitr::is_html_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(bootstrap_options = "striped"))
  }
}

Study results data model

specifications <- readr::read_csv(system.file("settings/ResultsModelSpecs.csv", package = "LegendT2dm"))

tables <- split(specifications, specifications$tableName)

for (table in tables) {
  header <- sprintf("### Table %s", table$tableName[1])

  table <- table %>%
    select(Field = .data$fieldName, Type = .data$type, Key = .data$primaryKey,
           Description = .data$description
           ) %>%
    kable(linesep = "", booktabs = TRUE, longtable = TRUE)

  if (knitr::is_latex_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(latex_options = "striped", font_size = latex_table_font_size) %>%
                 column_spec(1, width = "10em") %>%
                 column_spec(2, width = "5em") %>%
                 column_spec(3, width = "3em") %>%
                 column_spec(4, width = "16em"))
  } else if (knitr::is_html_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(bootstrap_options = "striped"))
  }
}

Database access

All LegendT2dm tables are located on OHDSI's public PostgreSQL server that is the backend for the apps on (https://data.ohdsi.org). Study PIs can provide read-only credentials that can be stored encrypted on individual computers. For this latter task, we recommend using the cross-platform keyring R package to securely store: legendt2dmServer, legendt2dmDatabase, legendt2dmUser, legendt2dmPassword and legendT2dmDriverPath and specifying connection details via:

Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = keyring::key_get("legendT2dmDriverPath"))

legendT2dmConnectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  server = paste(keyring::key_get("legendt2dmServer"),
                 keyring::key_get("legendt2dmDatabase"),
                 sep = "/"),
  user = keyring::key_get("legendt2dmUser"),
  password = keyring::key_get("legendt2dmPassword"))

R Shiny

Exposure and outcome cohort characteristics and diagnostics

To explore cohorts and their diagnostics through an R Shiny application, we use the LegendT2dmCohortExplorer package. The following code will load all class cohorts:

LegendT2dmCohortExplorer::launchCohortExplorer(cohorts = "class",
                                               connectionDetails = legendT2dmConnectionDetails)

Use cohorts = "drug" and cohorts = "outcome" to explore the Drug-vs-Drug Study exposure and outcome cohorts, respectively. This Shiny applications is based on CohortDiagnostics v2.1 and it is also possible to explore cohorts using local files by specifying the dataFolder argument.

A useful Cohort name filter is main( (see Table \@ref(tab:category-codes)).

Comparative effectivess and safety evidence

To explore the comparative effectiveness and safety evidence through an R Shiny application, we use the LegendT2dmEvidenceExplorer package. The following code will load all class cohort comparison evidence:

LegendT2dmEvidenceExplorer::launchEvidenceExplorer(cohorts = "class",
                                                   connectionDetails = legendT2dmConnectionDetails)

Use cohorts = "drug" to explore the drug cohort comparison evidence. It is also possible explore evidence using local files by specifying the dataFolder argument.

Definitions

Orphan concepts

Orphan concepts are (source) concepts that likely should be included in a concept set, but aren't, often because of vocabulary issues. This is a rough outline of the process:

HOWTOs

Render human-readable cohort definition

The LegendT2dm provides functions to render cohort definitions into Rmarkdown for output to PDF or HTML. Here is an example of rendering the complete definition for new-users of SGLT2 inhibitors:

cohortJson <- SqlRender::readSql(system.file("cohorts", "class", "ID301100000.json",
                                             package = "LegendT2dm"))

LegendT2dm::printCohortDefinitionFromNameAndJson(
  name = "Class-vs-Class Exposure (SGLT2 New-User) Cohort / OT1",
  json = cohortJson)

Dissemination

Publications

If possible, LEGEND-T2DM publications should be drafted in Rmarkdown with *.Rmd source files version-controlled on github. Initial drafting can begin in a private repository, but final versions should end up in ohdsi-studies/LegendT2dm/Documents. Currently, we have the following documents:

Presentations

Nothing yet. We have a cultural war here between PowerPoint, latex and Rmarkdown.



ohdsi-studies/LegendT2dm documentation built on July 4, 2025, 8:25 p.m.