TxRegQuery is an R package that utilizes mongolite, GenomicRanges, and bigrquery to provide users with specialized tools for information access, retrieval from both MongoDB/Drill and BigQuery backends. TxRegQuery makes it possible to extract data in the form of GRanges objects, which can be filtered and intersected to create node-edge lists used in the formulation of regulatory networks.
Our database "txregnet" hosts the following types of data :
GTEx eqtl files: /udd/reala/repos/analyses/TxRegNet/data/eQTL/GTEx_v6p/filtered_0.05/eqtl_files
Annotations
Encode DNAse hypersensitivity bed files: /proj/rerefs/reref00/ENCODE/DNaseI_Roadmap_Encode/BEDFILES
The Transcription factor data is currently on shion.
TF data
Kimbie's CISBP FIMO scan TF bed files: /udd/rekrg/EpiPANDA/FIMO_results/ScanBedResults - http://cisbp.ccbr.utoronto.ca/faq.html is origin of PSSM
Kheradpour TF sites are located at: /udd/reala/reference_files/eqtl_gwas/matches.txt.gz
This has been implemented in :
1. shion
2. Amazon EC2 Instance :
Specification :
- m4.2xlarge, 8 vCPUs, 32 GiB Memory, High Network Performance.
- Ubuntu Server 16.04 LTS (HVM), SSD Volume Type, with a attached EBS volume of 250 GiB.
suppressPackageStartupMessages({ library(TxRegQuery) library(mongolite) library(RMongo) library(GenomicRanges) })
Structure : We have a mongo database "txregnet" hosting multiple collections for the eQTL, FootPrint, DNASe HyperSensitivity data.
To access the different collections :
myoff=100*1000 mystartGo=38077296-myoff myendGo=38083884+myoff mychrGo=17 mycollGo="Cells_Transformed_fibroblasts_Analysis_v6p_all_snpgene_pairs_eQTL" mydbGo="txregnet" res_eqtl=getMongoRangeEqtl(mychrGo,mystartGo,myendGo,mycollGo,mydbGo) res_eqtl
myoff=100*1000 mystartGo=38077296-myoff myendGo=38083884+myoff mychrGo="chr17" mydbGo="txregnet" mycollGo="CD19_DS17186_hg19_FP" res_fp=getMongoRangeFp(mychrGo,mystartGo,myendGo,mycollGo,mydbGo) res_fp
We use Apache Drill to extract the GRanges of the Transcription Factor data present on (shion currently)
Structure:
We have defined a dataset "txregnet" under the project "cgc-xx-xxx". There are three tables "eQTL", "FP", "TF". It is required to have a billing group set up with BigQuery.
To access the data on BigQuery :
mychrGo="17" mystartGo=41196312-500*1000 myendGo=41322262+500*1000 myprojectGo="cgc-xx-xxxx" mydatasetGo="txregnet" mybillingGo = "cgc-xx-xxxx" res_eqtl=getBigqueryRangeEqtl(mychrGo,mystartGo,myendGo,myprojectGo,mydatasetGo,mybillingGo) res_eqtl
mychrGo="chr17" mystartGo=41196312-500*1000 myendGo=41322262+500*1000 myprojectGo="cgc-xx-xxxx" mydatasetGo="txregnet" mybillingGo = "cgc-xx-xxxx" res_fp=getBigqueryRangeFp(mychrGo,mystartGo,myendGo,myprojectGo,mydatasetGo,mybillingGo) res_fp
mychrGo="chr17" mystartGo=41196312-500*1000 myendGo=41322262+500*1000 myprojectGo="cgc-xx-xxxx" mydatasetGo="txregnet" mybillingGo = "cgc-xx-xxxx" res_tf=getBigqueryRangeTf(mychrGo,mystartGo,myendGo,myprojectGo,mydatasetGo,mybillingGo) res_tf
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.