Quick Start

Generating your batch of CDEs -> list of studies (phs study accession) which have your data type:

library(readr)
CDE_Ocular <- read_csv("data/CDE_Ocular.csv")
View(CDE_Ocular)
###dummy example SQL query studies matching your .csv Name column with datatype study name from dbGaPdb and sequence type info from SRAdb

NIH Common Data Elements (CDE)

NIH is currently producing Common Data Elements (cde.nih.nlm.gov) to harmonize the categories of data types present across studies available in their databases. This is currently implemented in dbGaP, and provides both a common naming terminology (often an amorphous concept) and consensus measures for Phenotypes and eXposures (PhenX) identifier. You can cross reference to dbGaP (phs study accession) and therefore SRAdb (sequence datatype) to define a set of studies in which that data parameter was taken, as well as other characteristics of those studies.

How would you use this data?

Often a difficulty in determining studies of interest for analysis is knowing what metadata was collected in a given study. In our example case someone studying eye conditions can see that there are harmonized PhenX codes under "Ocular" for 36 data points--some obivously relevant, some less intuitive but still potentially significant--across 271 dbGaP accession variables. These include questions that are likely not asked in the same manner in every study. For example: "Do you have any retinal defects, retinal tears, detachments, etc" can be written many ways, but PX111001120000 is a definable identifier, and hits across four studies in dbGaP. Being able to tie this information into the other dbGaPdb resources can better tailor your metadata search to focus on studies worth your time.

Generating Data Table from exported json

Currently there is not programatic access to CDE elements, so a list of primary names has to be downloaded. You can go to the NIH CDE website https://cde.nlm.nih.gov/cde/search?selectedOrg=PhenX and search their PhenX terms for dbGaP. From the results you can stratify by category (on the left) and export a json (upper right corner). In this example, we selected the "Ocular" category to produce a sample subset "data/CDE_Ocular.csv":

Image of CDE website

Querying based on the table

The primary key is the variable_accession (phv) is indexed to both the PhenX_ID and the Phenx text name. The table can be queried with the standard dbGaP querying (See Intro_to_dbGaP_querying). For instance, if you wanted a list of studies associated with your data:

library(readr)
CDE_Ocular <- read_csv("data/CDE_Ocular.csv")
View(CDE_Ocular)
###dummy example SQL query studies matching your .csv Name column with datatype study name from dbGaPdb and sequence type info from SRAdb

Backend SQL data generation:

###dummy script on back end, $ python2 CDE_lite_json.py CDE_dbGAP.json


dbGaPdb/dbGaPdb documentation built on May 24, 2019, 2:04 a.m.