library(knitr)
opts_chunk$set(cache=TRUE)

dbGaPdb

Linking dbGaPdb with SRAdb

SRAdb allows for efficient searching through metadata associated with files deposited to the Sequence Read Archives (SRA). dpGaP contains studies with genomic data that are found in SRA. To access the metadata of dpGaP files in SRA, download SRAdb and search the 'sra' table for any rows that contain accessions starting with 'phs' found in the 'study_alias' column.

library(SRAdb)
library(tidyverse)
library(stringr)
if(!file.exists('SRAmetadb.sqlite'))
    sqlfile <<- getSRAdbFile()
sqlfile = 'SRAmetadb.sqlite'
sra_con <- dbConnect(SQLite(), sqlfile)

You have to set sqlfile to the path of the sqlite file on your computer

### load('../data/demo_dbgap_metadata.Rdata') 
### Would this already be loaded in by the bioconductor package?
sra_in_dbGaP <- dbGetQuery(sra_con, "select * from sra where study_alias like 'phs%'")
dim(sra_in_dbGaP)
sra_in_dbGaP <- sra_in_dbGaP %>% 
              rowwise() %>% 
              mutate(study_alias2 = str_split(study_alias,'_')[[1]][1])
head(sra_in_dbGaP)        

Study accession codes found in the dpGaPdp metadata include study version information (e.g., phs000514.v1). However, SRAdb study accession codes do not follow the same nomenclature (e.g., phs000307_49). The above lines of code trim the inconsistency but, preserve the correct linking of the two databases using dplyr.



dbGaPdb/dbGaPdb documentation built on May 24, 2019, 2:04 a.m.