Introduction

TxRegQuery is an R package that utilizes mongolite, GenomicRanges, and bigrquery to provide users with specialized tools for information access, retrieval from both MongoDB/Drill and BigQuery backends. TxRegQuery makes it possible to extract data in the form of GRanges objects, which can be filtered and intersected to create node-edge lists used in the formulation of regulatory networks.

Our database "txregnet" hosts the following types of data :

  1. eQTL/GWAS data
  2. GTEx eqtl files: /udd/reala/repos/analyses/TxRegNet/data/eQTL/GTEx_v6p/filtered_0.05/eqtl_files

  3. Annotations

  4. Encode DNAse hypersensitivity bed files: /proj/rerefs/reref00/ENCODE/DNaseI_Roadmap_Encode/BEDFILES

  5. Roadmap digital genomic footprints: /udd/reala/reference_files/annotations/ROADMAP_DGF/

The Transcription factor data is currently on shion.

  1. TF data

  2. Kimbie's CISBP FIMO scan TF bed files: /udd/rekrg/EpiPANDA/FIMO_results/ScanBedResults - http://cisbp.ccbr.utoronto.ca/faq.html is origin of PSSM

  3. Kheradpour TF sites are located at: /udd/reala/reference_files/eqtl_gwas/matches.txt.gz

Implementation

This has been implemented in : 1. shion 2. Amazon EC2 Instance : Specification : - m4.2xlarge, 8 vCPUs, 32 GiB Memory, High Network Performance.
- Ubuntu Server 16.04 LTS (HVM), SSD Volume Type, with a attached EBS volume of 250 GiB.

suppressPackageStartupMessages({
library(TxRegQuery)
library(mongolite)
library(RMongo)
library(GenomicRanges)
})

MongoDB as a backend

Structure : We have a mongo database "txregnet" hosting multiple collections for the eQTL, FootPrint, DNASe HyperSensitivity data.

Establish connection

To access the different collections :

eQTL

To retrieve GRanges of the eQTL's :

myoff=100*1000
mystartGo=38077296-myoff
myendGo=38083884+myoff
mychrGo=17
mycollGo="Cells_Transformed_fibroblasts_Analysis_v6p_all_snpgene_pairs_eQTL"
mydbGo="txregnet"
res_eqtl=getMongoRangeEqtl(mychrGo,mystartGo,myendGo,mycollGo,mydbGo)
res_eqtl

Annotation Data

To retrieve the GRanges of Footprints data :

myoff=100*1000
mystartGo=38077296-myoff
myendGo=38083884+myoff
mychrGo="chr17"
mydbGo="txregnet"
mycollGo="CD19_DS17186_hg19_FP"
res_fp=getMongoRangeFp(mychrGo,mystartGo,myendGo,mycollGo,mydbGo)
res_fp

Apache Drill

Transcription Factor Data

We use Apache Drill to extract the GRanges of the Transcription Factor data present on (shion currently)

BigQuery as a backend

Structure:

We have defined a dataset "txregnet" under the project "cgc-xx-xxx". There are three tables "eQTL", "FP", "TF". It is required to have a billing group set up with BigQuery.

To access the data on BigQuery :

eQTL

To retrieve the GRanges of the eQTL's :

mychrGo="17"
mystartGo=41196312-500*1000
myendGo=41322262+500*1000
myprojectGo="cgc-xx-xxxx"
mydatasetGo="txregnet"
mybillingGo = "cgc-xx-xxxx"
res_eqtl=getBigqueryRangeEqtl(mychrGo,mystartGo,myendGo,myprojectGo,mydatasetGo,mybillingGo)
res_eqtl

Annotation Data

To retrieve the GRanges of Footprint's data :

mychrGo="chr17"
mystartGo=41196312-500*1000
myendGo=41322262+500*1000
myprojectGo="cgc-xx-xxxx"
mydatasetGo="txregnet"
mybillingGo = "cgc-xx-xxxx"
res_fp=getBigqueryRangeFp(mychrGo,mystartGo,myendGo,myprojectGo,mydatasetGo,mybillingGo)
res_fp

Transcription Factor Data

To retrieve the GRanges of the TF data :

mychrGo="chr17"
mystartGo=41196312-500*1000
myendGo=41322262+500*1000
myprojectGo="cgc-xx-xxxx"
mydatasetGo="txregnet"
mybillingGo = "cgc-xx-xxxx"
res_tf=getBigqueryRangeTf(mychrGo,mystartGo,myendGo,myprojectGo,mydatasetGo,mybillingGo)
res_tf

neo4j Building



shwetagopaul92/txregquery documentation built on May 14, 2019, 7:43 a.m.