In ohdsi-studies/LegendT2dm: Large-Scale Evidence Generation and Evaluation in a Network of Databases

knitr::opts_chunk$set(echo = TRUE)
options(kableExtra.latex.load_packages = FALSE)
library(kableExtra)
#knitr::knit_hooks$set(document = function(x) {sub('\\usepackage[]{color}', '\\usepackage[table]{xcolor}', x, fixed = TRUE)})
library(dplyr)
options(knitr.kable.NA = "")

# pdf2png <- function(path) {
#   # only do the conversion for non-LaTeX output
#   if (knitr::is_latex_output()) {
#     return(path)
#   }
#   path2 <- xfun::with_ext(path, "png")
#   img <- magick::image_read_pdf(path)
#   magick::image_write(img, path2, format = "png")
#   path2
# }

latex_table_font_size <- 8

TODO List {-}

Fix export of ps_assessment to remove NULL coefficients
Make corresponding SQL column NOT NULL again
IIT end of obs period for VA
HbA1c control (remove earliest event and remove subject prior to outcome)

List of Abbreviations

abbreviations <- readr::read_delim(col_names = FALSE, delim = ";", trim_ws = TRUE, file = "
  CDM; Common data model
  DPP4; Dipeptidyl peptidase-4
  GLP1; Glucagon-like peptide-1
  IRB; Institutional review board
  LEGEND; Large-scale Evidence Generation and Evaluation across a Network of Databases
  MACE; Major adverse cardiovascular event
  MDRR; Minimum detectable risk ratio
  OHDSI; Observational Health Data Science and Informatics
  OMOP; Observational Medical Outcomes Partnership
  PS; Propensity score
  RCT; Randomized controlled trial
  SGLT2; Sodium-glucose co-transporter-2
  T2DM; Type 2 diabetes mellitus
")

tab <- kable(abbreviations, col.names = NULL, linesep = "", booktabs = TRUE)

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Getting Started

LEGEND-T2DM consists of several R packages available on https://github.com/ohdsi-studies:

LegendT2dm
LegendT2dmCohortExplorer and
LegendT2dmEvidenceExplorer

that depend themselves on a large number of packages from Hades (add link). To gracefully handle these dependencies, LEGEND-T2DM uses renv through a version-controlled renv.lock file to specify each package. To install LegendT2dm and all of its dependencies, we first "pull" the lastest source from github:

git clone https://github.com/ohdsi-studies/LegendT2dm

then from within an R or Rstudio terms, we active renv:

install.packages("renv") # Only necessary once
renv::activate()
renv::restore()

after which one can now build and install the final LegendT2dm package.

Exposure Cohort Counts

For the Class-vs-Class Study, we will construct $$ \begin{aligned} & (4 \text{ drug-class targets})\ \times \ & (2 \text{ On-treatment censoring types})\ \times \ & (2 \text{ Prior metformin-use choices}) \ & = 16 \end{aligned} $$ exposure cohort definitions. Likewise, for the Drug-vs-Drug Study, we will construct $$ \begin{aligned} & (22 \text{ drug-ingredient targets})\ \times \ & (2 \text{ On-treatment censoring types})\ \times \ & (2 \text{ Prior metformin-use choices}) \ & = 88 \end{aligned} $$ exposure cohort definitions. For the Heterogeneity Study, we will stratify each of these cohort definitions into $$ \begin{aligned} & 3 \text{ (age: 18-44 / 45-64 /}\ge\text{65) } + \ & 2 \text{ (sex: female / male) } + \ & 1 \text{ (race: Black) } + \ & 2 \text{ (cardiovascular risk: lower / higher) } + \ & 2 \text{ (renal impairment: with/without) } \ & = 10 \end{aligned} $$ additional definitions, results in $(16 + 88) \times 10 = 1040$ further cohort definitions.

Exposure Cohort ID Key

This section documents internal coding used across the three studies to label cohort definitions and their SQL representation. Each cohort carries 9-digit identifier $$ \text{ID}\ [DD][T][M][A][S][R][C][K] $$ where Table \@ref(tab:drug-codes) reports the target 2-digit drug codes $DD$ and Table \@ref(tab:category-codes) lists the codes for time-at-risk $T$, metformin $M$, age $A$, sex $S$, race $R$, cardiovascular $C$ and renal (kidney) $K$ categories.

addTab <- function(name, cohortId) {
  if (cohortId %% 10 != 0) {
    paste0("\\ \\ \\ \\ ", name)
  } else {
    name
  }
}

exposures <- read.csv(system.file("settings/ExposuresOfInterest.csv", package = "LegendT2dm"))

drugCodes <- exposures  %>% arrange(cohortId) %>%
  rowwise() %>%
  mutate(name = addTab(name, cohortId)) %>%
  select(name, cohortId) %>%
  rename(Drug = name, "2-digit Code" = cohortId)

tab <- kable(drugCodes, booktabs = TRUE, linesep = "",
             caption = "2-Digit Drug Codes") %>%
  column_spec(1, width = "10em") %>%
  column_spec(2, width = "30em")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

categoryCodes <- readr::read_delim(col_names = TRUE, delim = ";", trim_ws = TRUE, file = "
  Category; Value; 1-digit Code
  Time-at-risk;
  ; ITT/OT1; 1
  ; OT2; 2
  Metformin;
  ; prior; 1
  ; none; 2
  Age ;
  ; any; 0
  ; younger; 1
  ; middle; 2
  ; older; 3
  Sex ;
  ; any; 0
  ; female; 1
  ; male; 2
  Race ;
  ; any; 0
  ; Black; 1
  Cardiovascular risk ;
  ; any; 0
  ; low; 1
  ; higher; 2
  Renal disease ;
  ; any; 0
  ; without; 1
  ; with; 2
")

footnote <- c("ITT: intent-to-treat; OT1: on-treatment-1; OT2: on-treatment-2")

tab <- kable(categoryCodes, booktabs = TRUE, linesep = "",
             caption = "Other Cohort Category Codes") %>%
  footnote(general = footnote, general_title = "")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Here are useful regex masks for the different strata

mask <- readr::read_csv(system.file("settings", "masks.csv", package = "LegendT2dm"))

tab <- kable(mask, booktabs = TRUE, linesep = "",
             caption = "Regex masks for strata")
# %>%
  # footnote(general = footnote, general_title = "")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Analysis Plan Matrix

For each comparison within each study, we will execute the 7 CohortMethod analysis plans in Table \@ref(tab:plan-matrix).

design <- readr::read_delim(col_names = TRUE, delim = ";", trim_ws = TRUE, file = "
  ID; PS-covariates; PS-adjustment; Outcome model; Risk-window end
  1; ; ; Cox; target exposure end
  2; FE defaults; variable match; Cox (conditional); target exposure end
  3; FE defaults; 5 strata; Cox (conditional); target exposure end
  4; ; ; Cox; observational period end
  5; FE defaults; variable match; Cox (conditional); observational period end
  6; FE defaults; 5 strata; Cox (conditional); observational period end
  7; ; ; Cox; target exposure end or escalation
  8; FE defaults; variable match; Cox (conditional); target exposure end or escalation
  9; FE defaults; 5 strata; Cox (conditional); target exposure end or escalation
")

footnote <- c("FE: FeatureExtraction package (365 days long-term, 180 days medium-term and 30 days short-term look-back)",
              "PS: Propensity score")

tab <- kable(design, linesep = "", booktabs = TRUE,
             caption = "CohortMethod analysis descriptions") %>%
  footnote(general = footnote, general_title = "")


if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped", font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Configuration files

The LEGEND-T2DM studies depend on several configuration files. We directly provide some of these files, while others we generate through executable R scripts before distributing study packages.

Executable R scripts

R/CreateAnalysisDetails.R
extra/GenerateExposureCohortDefinitions.R

File dependencies

For all LEGEND-T2DM studies

baseCohort.json : Base cohort object that we programmatically manipulate to generate all class and drug exposure cohorts
inst/settings/ExposuresOfInterest.csv : List of RxNorm concepts for individual drug ingredient and Drug class specifications for all exposure cohorts. File columns:
type : Drug for individual ingredients or Drug class for ingredient classes
name : Unsure if currently used; longer name
cohortId : Unique concept ID used to make cohort ID
conceptId : RxNorm concept ID if individual ingredient, otherwise unique Drug class ID in c(1, 2, 3, 4)
class : Reference to Drug class conceptId if individual ingredient, otherwise NA
shortName : Unsure if currently used; short name
order : Unsure if currently used; order to display
includedConceptIds : RxNorm concept IDs if Drug class, otherwise NA
inst/settings/OutcomesOfInterest.csv : List of outcome cohorts to import into study package and instantiate during execution. File columns:
cohortId : Unique cohort ID inside study package
atlasId : Unique cohort ID on JnJ ATLAS instance (currently) for importing into package
atlasName : Short name for Cohort Diagnostics
name : File prefix name for .json and .sql specification
description : Human-readable, short description of cohort
cite : List of Rmd citation keys
isNew : Indicate a new cohort for LEGEND-T2DM
inst/settings/Indications.csv : List of study-specific settings. File columns:
indicationId : Unique study ID key, e.g., class
indicationName : Study name, e.g., Class-vs-class
filterConceptIds : Concept IDs to remove from propensity score models
positiveControlIdOffset : Offset with which to start numbering positive control outcomes

For `class` studies

The following files are generated by extra/GenerateExposureCohortDefinitions.R:

inst/settings/classCohortsToCreate.csv : List of exposure cohorts to create, including both main and heterogeneity study-stratified cohorts. File columns:
atlasId : Unique cohort ID
atlasName : Short name that CohortDiagnostics will display
cohortId : Must == atlasId (unsure if used)
name : File prefix name for .json and .sql specification
inst/settings/classTcosOfInterest.csv : List of TC pairs to compare in the class study. File name includes Tco and some columns for historical reasons; all outcomes are executed by default. File columns :
targetId : Target cohort ID
comparatorId : Comparator cohort ID
outcomeIds : Currently unused; defaults to -1
excludedCovariateConceptIds : Currently unused; defaults to NA
targetName : Unsure if currently used; short name for target cohort
comparatorName Unsure if currently used; short name for comparator cohort

Table specification files

The following files specify the diagnostics and results data models on the public Postgresql server:

inst/settings/PsAssessmentModelSpecs.csv : Propensity score assessment additions to the legendt2dm_*_diagnostics schema
inst/settings/ResultsModelSpecs.csv : All study results for the legendt2dm_*_results schema

These files serve as arguments to createDataModelSqlFile to generate

inst/sql/postgresql/CreatePsAssessmentTables.sql
inst/sql/postgresql/CreateResultsTables.sql

and to uploadResultsToDatabase to upload diagnostics and results to the public Postgresql server.

`CohortMethod` function arguments

Data models

This document describes the data models for storing the output of the CohortDiagnostics and CohortMethod study artifacts for the LEGEND-T2DM study. Table \@ref(tab:schema) provides the schema names.

schema <- readr::read_delim(col_names = FALSE, delim = ";", trim_ws = TRUE, file = "
  legendt2dm_class_diagnostics ; legendt2dm_class_results
  legendt2dm_drug_diagnostics ; legendt2dm_drug_results
  legendt2dm_outcome_diagnostics
")

tab <- kable(schema, col.names = c("Cohort diagnostics", "Study results"),
             linesep = "", booktabs = TRUE,
             caption = "LEGEND-T2DM public study schema")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped",
                        font_size = latex_table_font_size)
} else {
  tab %>% kable_styling(bootstrap_options = "striped")
}

Fields with minimum values

Some fields contain patient counts or fractions that are easily converted to patient counts. To prevent identifiability, these fields are subject to a minimum value. When the value falls below this minimum, it is replaced with the negative value of the minimum. For example, if the minimum subject count is 5, and the actual count is 2, the value stored in the data model will be -5, which could be represented as '\<5' to the user. Note that the value 0 is permissible, as it identifies no persons.

Cohort diagnostics data model

specifications <- readr::read_csv(system.file("/settings/resultsDataModelSpecification.csv", package = "CohortDiagnostics")) %>%
  filter(optional == "No")

tables <- split(specifications, specifications$tableName)

for (table in tables) {
  header <- sprintf("### Table %s", table$tableName[1])

  table <- table %>%
    select(Field = .data$fieldName, Type = .data$type, Key = .data$primaryKey
           # , Description = .data$description
           ) %>%
    kable(linesep = "", booktabs = TRUE, longtable = TRUE)

  if (knitr::is_latex_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(latex_options = "striped", font_size = latex_table_font_size) %>%
                 column_spec(1, width = "10em") %>%
                 column_spec(2, width = "5em") %>%
                 column_spec(3, width = "3em") %>%
                 column_spec(4, width = "16em"))
  } else if (knitr::is_html_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(bootstrap_options = "striped"))
  }
}

PS assessment data model

specifications <- readr::read_csv(system.file("settings/PsAssessmentModelSpecs.csv", package = "LegendT2dm"))

tables <- split(specifications, specifications$tableName)

for (table in tables) {
  header <- sprintf("### Table %s", table$tableName[1])

  table <- table %>%
    select(Field = .data$fieldName, Type = .data$type, Key = .data$primaryKey,
           Description = .data$description
           ) %>%
    kable(linesep = "", booktabs = TRUE, longtable = TRUE)

  if (knitr::is_latex_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(latex_options = "striped", font_size = latex_table_font_size) %>%
                 column_spec(1, width = "10em") %>%
                 column_spec(2, width = "5em") %>%
                 column_spec(3, width = "3em") %>%
                 column_spec(4, width = "16em"))
  } else if (knitr::is_html_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(bootstrap_options = "striped"))
  }
}

Study results data model

specifications <- readr::read_csv(system.file("settings/ResultsModelSpecs.csv", package = "LegendT2dm"))

tables <- split(specifications, specifications$tableName)

for (table in tables) {
  header <- sprintf("### Table %s", table$tableName[1])

  table <- table %>%
    select(Field = .data$fieldName, Type = .data$type, Key = .data$primaryKey,
           Description = .data$description
           ) %>%
    kable(linesep = "", booktabs = TRUE, longtable = TRUE)

  if (knitr::is_latex_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(latex_options = "striped", font_size = latex_table_font_size) %>%
                 column_spec(1, width = "10em") %>%
                 column_spec(2, width = "5em") %>%
                 column_spec(3, width = "3em") %>%
                 column_spec(4, width = "16em"))
  } else if (knitr::is_html_output()) {
    writeLines("")
    writeLines(header)

    writeLines(table %>%
                 kable_styling(bootstrap_options = "striped"))
  }
}

Database access

All LegendT2dm tables are located on OHDSI's public PostgreSQL server that is the backend for the apps on (https://data.ohdsi.org). Study PIs can provide read-only credentials that can be stored encrypted on individual computers. For this latter task, we recommend using the cross-platform keyring R package to securely store: legendt2dmServer, legendt2dmDatabase, legendt2dmUser, legendt2dmPassword and legendT2dmDriverPath and specifying connection details via:

Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = keyring::key_get("legendT2dmDriverPath"))

legendT2dmConnectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  server = paste(keyring::key_get("legendt2dmServer"),
                 keyring::key_get("legendt2dmDatabase"),
                 sep = "/"),
  user = keyring::key_get("legendt2dmUser"),
  password = keyring::key_get("legendt2dmPassword"))

R Shiny

Exposure and outcome cohort characteristics and diagnostics

To explore cohorts and their diagnostics through an R Shiny application, we use the LegendT2dmCohortExplorer package. The following code will load all class cohorts:

LegendT2dmCohortExplorer::launchCohortExplorer(cohorts = "class",
                                               connectionDetails = legendT2dmConnectionDetails)

Use cohorts = "drug" and cohorts = "outcome" to explore the Drug-vs-Drug Study exposure and outcome cohorts, respectively. This Shiny applications is based on CohortDiagnostics v2.1 and it is also possible to explore cohorts using local files by specifying the dataFolder argument.

A useful Cohort name filter is main( (see Table \@ref(tab:category-codes)).

Comparative effectivess and safety evidence

To explore the comparative effectiveness and safety evidence through an R Shiny application, we use the LegendT2dmEvidenceExplorer package. The following code will load all class cohort comparison evidence:

LegendT2dmEvidenceExplorer::launchEvidenceExplorer(cohorts = "class",
                                                   connectionDetails = legendT2dmConnectionDetails)

Use cohorts = "drug" to explore the drug cohort comparison evidence. It is also possible explore evidence using local files by specifying the dataFolder argument.

Definitions

Orphan concepts

Orphan concepts are (source) concepts that likely should be included in a concept set, but aren't, often because of vocabulary issues. This is a rough outline of the process:

Given a concept set expression, find all included concepts.
Find all names of those concepts, including synonyms, and the names of source concepts that map to them.
Search for concepts (standard and source) that contain any of those names as substring.
Filter those concepts to those that are not in the original set of concepts (i.e. orphans).
Restrict the set of orphan concepts to those that appear in the CDM database as either source concept or standard concept.

HOWTOs

Render human-readable cohort definition

The LegendT2dm provides functions to render cohort definitions into Rmarkdown for output to PDF or HTML. Here is an example of rendering the complete definition for new-users of SGLT2 inhibitors:

cohortJson <- SqlRender::readSql(system.file("cohorts", "class", "ID301100000.json",
                                             package = "LegendT2dm"))

LegendT2dm::printCohortDefinitionFromNameAndJson(
  name = "Class-vs-Class Exposure (SGLT2 New-User) Cohort / OT1",
  json = cohortJson)

Dissemination

Publications

If possible, LEGEND-T2DM publications should be drafted in Rmarkdown with *.Rmd source files version-controlled on github. Initial drafting can begin in a private repository, but final versions should end up in ohdsi-studies/LegendT2dm/Documents. Currently, we have the following documents:

Protocol/Protocol.Rmd: official protocol registered (soon) with EU PASS
Protocol/MedRvix.Rmd: reformatted protocol for medRvix, available at: (add url)
Protocol/BmjOpen.Rmd: reformatted protocol for BMJOpen (to be submitted)

Presentations

Nothing yet. We have a cultural war here between PowerPoint, latex and Rmarkdown.

ohdsi-studies/LegendT2dm documentation built on July 4, 2025, 8:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ohdsi-studies/LegendT2dm
Large-Scale Evidence Generation and Evaluation in a Network of Databases

In ohdsi-studies/LegendT2dm: Large-Scale Evidence Generation and Evaluation in a Network of Databases

TODO List {-}

List of Abbreviations

Getting Started

Exposure Cohort Counts

Exposure Cohort ID Key

Analysis Plan Matrix

Configuration files

Executable R scripts

File dependencies

For all LEGEND-T2DM studies

For `class` studies

Table specification files

`CohortMethod` function arguments

Data models

Fields with minimum values

Cohort diagnostics data model

PS assessment data model

Study results data model

Database access

R Shiny

Exposure and outcome cohort characteristics and diagnostics

Comparative effectivess and safety evidence

Definitions

Orphan concepts

HOWTOs

Render human-readable cohort definition

Dissemination

Publications

Presentations

R Package Documentation

Browse R Packages

We want your feedback!

ohdsi-studies/LegendT2dm Large-Scale Evidence Generation and Evaluation in a Network of Databases

In ohdsi-studies/LegendT2dm: Large-Scale Evidence Generation and Evaluation in a Network of Databases

TODO List {-}

List of Abbreviations

Getting Started

Exposure Cohort Counts

Exposure Cohort ID Key

Analysis Plan Matrix

Configuration files

Executable R scripts

File dependencies

For all LEGEND-T2DM studies

For class studies

Table specification files

CohortMethod function arguments

Data models

Fields with minimum values

Cohort diagnostics data model

PS assessment data model

Study results data model

Database access

R Shiny

Exposure and outcome cohort characteristics and diagnostics

Comparative effectivess and safety evidence

Definitions

Orphan concepts

HOWTOs

Render human-readable cohort definition

Dissemination

Publications

Presentations

R Package Documentation

Browse R Packages

We want your feedback!

ohdsi-studies/LegendT2dm
Large-Scale Evidence Generation and Evaluation in a Network of Databases

For `class` studies

`CohortMethod` function arguments