Date: 2023-02-20
Summary: Characterization to assess prevalence of Bipolar Disorder, Depresssion, and Suicidality
The following packages will be loaded to conduct the characterization:
library(DatabaseConnector) library(dplyr) library(lubridate) library(readr) library(SqlRender) library(tibble)
To learn more about these packages, see the Appendix.
Here, we need to set-up connection to the OMOP CDM database we will assess. To do so, we need to define some constants that will be used for the connection. The following list of constants:
dbms
- the database management system that is used to host your database; common options include (see all options here):"postgresql"
"sql server"
server
- name of the server; could be localhost
, an address like 123.0.1.5
, etc. user
- your username to access the server password
- the password you use to access the server port
- the port where the database is hostedschema
- name of the database schema usedMust be defined in this code block:
dbms <- "Fill in here" server <- "Fill in here" user <- "Fill in here" password <- "Fill in here" port <- "Fill in here" schema <- "Fill in here"
An additional step needed is to configure the required driver to connect to the database as follows:
This is accomplished in the following codeblock (change eval = FALSE
to eval = FALSE
when you have set these variables correctly):
pathToDriver <- "/location/that/you/want" downloadJdbcDrivers(dbms = dbms, pathToDriver = pathToDriver, method = "auto")
Once this is done, we can create the connection to the database (change eval = FALSE
to eval = FALSE
when you have set these variables correctly):
connectionDetails <- createConnectionDetails(dbms=dbms, server=server, user=user, password=password, port=port, pathToDriver=pathToDriver) connection <- connect(connectionDetails)
If there were no errors, then we should be able to continue with the analysis!
WARN: As you proceed with this analysis, if you encounter a Java issue like this: "Insufficient java heap memory", please run the following code block:
r options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx8192m"))
This is only an emergency work around and should be removed when a better solution is found.
If any of this was confusing, here is an example of how to fill out the above connection information:
dbms <- "postgresql" server <- "test.data.americus.edu/mimic_omop" user <- "mimic" password <- "omoprocks" port <- 5042 schema <- "mimic.omop" pathToDriver = "utils" downloadJdbcDrivers(dbms = dbms, pathToDriver = pathToDriver, method = "auto") connectionDetails <- createConnectionDetails(dbms=dbms, server=server, user=user, password=password, port=port, pathToDriver=pathToDriver) connection <- connect(connectionDetails)
condition_concept_ids
for a Given ConditionFor this task, we want to load the concept set and get all of the concept IDs for the diseases being studied:
pathToPhenotype_bipolar <- "../phenotypes/bipolar_concept_set.csv" bipolar_concept_ids <- read.csv(pathToPhenotype_bipolar)$CONCEPT_ID
pathToPhenotype_depression <- "../phenotypes/depression_concept_set.csv" depression_concept_ids <- read.csv(pathToPhenotype_depression)$CONCEPT_ID
pathToPhenotype_suicidality <- "../phenotypes/suicidality_concept_set.csv" suicidality_concept_ids <- read.csv(pathToPhenotype_suicidality)$CONCEPT_ID
Here, we find every patient that has a history of one of the diseases we are studying (at least one diagnosis):
source("./sql/condition_filter.R") bipolar_patients <- condition_filter(schema, bipolar_concept_ids, dbms, connection) depression_patients <- condition_filter(schema, depression_concept_ids, dbms, connection) suicidality_patients <- condition_filter(schema, suicidality_concept_ids, dbms, connection)
Now we run a query for these patient populations to get their stratification by the axes of race, gender, and age group:
source("./sql/stratified_person.R") bipolar_patients <- stratify_persons(schema, bipolar_patients, dbms, connection) depression_patients <- stratify_persons(schema, depression_patients, dbms, connection) suicidality_patients <- stratify_persons(schema, suicidality_patients, dbms, connection)
To maintain patient privacy and security, we will follow HITECH standards to filter out any patient subpopulation that has a count less than 11 from this dataset.
bipolar_patients <- bipolar_patients %>% filter(COUNTS > 10)
depression_patients <- depression_patients %>% filter(COUNTS > 10)
suicidality_patients <- suicidality_patients %>% filter(COUNTS > 10)
Here we now finally save this data to the data
directory in the root of the study:
exportFolder <- "../data/baseline" write.csv(bipolar_patients, file = file.path(exportFolder, "bipolar_person_stratified_breakdown.csv"), row.names = FALSE) write.csv(depression_final_df, file = file.path(exportFolder, "depression_person_stratified_breakdown.csv"), row.names = FALSE) write.csv(suicidality_final_df, file = file.path(exportFolder, "suicidality_person_stratified_breakdown.csv"), row.names = FALSE)
This concludes the steps required for generating the baseline of our characterization study.
Please take the folder named baseline
located in the root of the study in the data
folder and send it to us at GTRI at jacob.zelko@gtri.gatech.edu.
Thank you very much for participating in this study!
renv
- create reproducible environments for R projectsdplyr
- grammar for data manipulationtibble
- improved data.frame functionalitySqlRender
- package for rendering parameterized SQLDatabaseConnector
- package for connecting to databases using JDBClubridate
- makes it easier to work with date timesreadr
- a fast and friendly way to read rectangular dataAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.