library(SqlRender) knitr::opts_chunk$set( cache = FALSE, comment = "#>", error = FALSE, tidy = FALSE ) # Temp folders used to run the example exportFolder <- tempfile("CohortDiagnosticsTestExport") inclusionStatisticsFolder <- tempfile("inclusionStats")
This vignette discusses the process of generating a results set with CohortDiagnostics
starting with cohort generation.
Please see the HADES library for more information on the background for this.
Ensure that CohortDiagnostics
is installed on your system and updated to the latest version.
For this example we will also be using the Eunomia
test package.
Optionally, you may install the ROhdsiWebApi
package to download cohort definitions from an ATLAS instance:
remotes::install_github("OHDSI/Eunomia") remotes::install_github("OHDSI/ROhdsiWebApi")
We need to tell R how to connect to the server where the data are. CohortDiagnostics
uses the DatabaseConnector
package, which provides the createConnectionDetails
function. Type ?createConnectionDetails
for the specific settings required for the various database management systems (DBMS). For example, one might connect to a PostgreSQL database using this code:
library(CohortDiagnostics) connectionDetails <- createConnectionDetails( dbms = "postgresql", server = "localhost/ohdsi", user = "joe", password = "supersecret" )
For the purposes of this example, we will use the Eunomia test CDM package that is in an Sqlite database stored locally.
connectionDetails <- Eunomia::getEunomiaConnectionDetails() cdmDatabaseSchema <- "main" tempEmulationSchema <- NULL cohortDatabaseSchema <- "main" cohortTable <- "cohort"
The last four lines define the cdmDatabaseSchema
, tempEmulationSchema
, cohortDatabaseSchema
, and cohortTable
variables.
We'll use the cdmDatabaseSchema
later to tell R where the data in CDM format live. The tempEmulationSchema
is needed only for Oracle users, since Oracle does not support temporary tables.
The cohortDatabaseSchema
, and cohortTable
specify where we want to instantiate our cohorts.
Note that for Microsoft SQL Server, database schemas need to specify both the database and the schema, so for example cdmDatabaseSchema <- "my_cdm_data.dbo"
.
The prefered usage of cohort diagnostics is through the use of a study package. This is a dedicated R package that can be installed on a system and run. The primary reason for this is due to reproducibility, cohort definitions and resources frequently change. However, a study package can be seen as a snapshot, frozen at the time of creation and incrementally updated.
For example, the cohort diagnostics package includes an example set of cohort sql and json to run on the Eunomia
test data in the OMOP Common Data Model format.
library(CohortDiagnostics) cohortDefinitionSet <- CohortGenerator::getCohortDefinitionSet( settingsFileName = "Cohorts.csv", jsonFolder = "cohorts", sqlFolder = "sql/sql_server", packageName = "CohortDiagnostics" )
Looking at this data.frame of Cohorts you will see the sql and json for these cohorts:
View(cohortDefinitionSet)
It is often desirable to perform cohort diagnostics on definitions stored in an ATLAS instance. Though this is not the preferred way of running studies (and this is certainly not the preferred method for an OHDSI network study involving multiple sites) it is possible to load references into a data frame used by cohort diagnostics.
The following code demonstrates how to create a set of cohort references from ATLAS that can be used by cohort diagnostics:
# Set up url baseUrl <- "https://atlas.hosting.com/WebAPI" # list of cohort ids cohortIds <- c(18345, 18346) cohortDefinitionSet <- ROhdsiWebApi::exportCohortDefinitionSet( baseUrl = baseUrl, cohortIds = cohortIds, generateStats = TRUE )
Consult the ROhdsiWebApi documentation for details on authentication to your atlas instance.
Please note that in order to generate inclusion rules statistics (a useful diagnostic tool) the parameter generateStats
should be set to TRUE
.
Cohorts must be generated before cohort diagnostics can be run.
For example,
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = cohortTable) # Next create the tables on the database CohortGenerator::createCohortTables( connectionDetails = connectionDetails, cohortTableNames = cohortTableNames, cohortDatabaseSchema = "main", incremental = FALSE ) # Generate the cohort set CohortGenerator::generateCohortSet( connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, cohortDatabaseSchema = cohortDatabaseSchema, cohortTableNames = cohortTableNames, cohortDefinitionSet = cohortDefinitionSet, incremental = FALSE )
Note, that the above code will delete an existing table.
However, incremental mode can be used when setting the parameter incremental = TRUE
.
The resulting cohort table should include the columns:
cohort_definition_id
, subject_id
, cohort_start_date
, cohort_end_date
Once cohort definitions are loaded and cohort tables have been populated cohort diagnostics is ready to be executed.
First we set an export folder, this is where the results will be stored.
exportFolder <- "export"
Then we execute the function (using the default settings) as follows:
executeDiagnostics(cohortDefinitionSet, connectionDetails = connectionDetails, cohortTable = cohortTable, cohortDatabaseSchema = cohortDatabaseSchema, cdmDatabaseSchema = cdmDatabaseSchema, exportFolder = exportFolder, databaseId = "MyCdm", minCellCount = 5 )
The above cohort generation process will create a number of residual tables. As the process is complete, these are no longer required and can be removed.
CohortGenerator::dropCohortStatsTables( connectionDetails = connectionDetails, cohortDatabaseSchema = cohortDatabaseSchema, cohortTableNames = cohortTableNames )
Once the diagnostics have completed, a zip file will have been created in the specified export folder.
This zip file can be shared between sites, as it does not contain patient-identifiable information.
When unzipped, the zip file will contain several .csv files that maybe easily audited.
Note that cell counts smaller than 5 have been removed, as specified using the minCellCount
argument, to ensure non-identifiability.
Assuming you completed the steps described above for one or more databases, you should now have a set of zip files, one per database. Make sure to place all zip files in a single folder.
Optionally, we can pre-merge the zip files into an sqlite database so we can view results in the Shiny app:
createMergedResultsFile(exportFolder)
This file can be used in the shiny app to explore results. See the vignette "Viewing results using Diagnostics Explorer" for more details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.