library(knitr) opts_chunk$set(cache=TRUE)
SRAdb allows for efficient searching through metadata associated with files deposited to the Sequence Read Archives (SRA). dpGaP contains studies with genomic data that are found in SRA. To access the metadata of dpGaP files in SRA, download SRAdb and search the 'sra' table for any rows that contain accessions starting with 'phs' found in the 'study_alias' column.
library(SRAdb) library(tidyverse) library(stringr)
if(!file.exists('SRAmetadb.sqlite')) sqlfile <<- getSRAdbFile() sqlfile = 'SRAmetadb.sqlite' sra_con <- dbConnect(SQLite(), sqlfile)
You have to set sqlfile
to the path of the sqlite file on your computer
### load('../data/demo_dbgap_metadata.Rdata') ### Would this already be loaded in by the bioconductor package?
sra_in_dbGAP
and dbGaP metadata to make join-ablesra_in_dbGaP <- dbGetQuery(sra_con, "select * from sra where study_alias like 'phs%'") dim(sra_in_dbGaP) sra_in_dbGaP <- sra_in_dbGaP %>% rowwise() %>% mutate(study_alias2 = str_split(study_alias,'_')[[1]][1]) head(sra_in_dbGaP)
Study accession codes found in the dpGaPdp metadata include study
version information (e.g., phs000514.v1
). However, SRAdb study
accession codes do not follow the same nomenclature (e.g.,
phs000307_49
). The above lines of code trim the inconsistency but,
preserve the correct linking of the two databases using dplyr
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.