shims in TxRegInfra: dealing with heterogeneous annotation native to different genomic assay archives
In TxRegInfra: Metadata management for multiomic specification of transcriptional regulatory networks

library(TxRegInfra)

Overview

In home maintenance, shims are little wedges of wood that you stick into wobbly entities to make them more stable.

We need things like this to deal with diverse data resources in genomics. Here's an example of the problem:

cm = TxRegInfra:::basicCfieldsMap()
cmdf = as.data.frame(cm)
names(cmdf) = names(cm)
cmdf

The rownames of this data frame are target annotation terms for features of GRanges: chrom, start, end. The columns are different assay types. Entry i,j is the notation for feature i on assay type j. Thus for eQTL data the start and end can be determined from the source attribute 'snp_pos', while for footprints (FP) the footprint end is denoted 'end' and for hotspots (HS) the footprint end is denoted 'chromEnd'.

In order to leave data in its original state but simplify downstream integration, we use shims like basicCfieldsMap to map attribute names to a common vocabulary.

Application to RaggedMongoExpt instances

We use RaggedMongoExpt instances to work with contents of a remote MongoDB that holds large volumes of genomic annotation.

Construction

con1 = mongo(url=URL_txregInAWS(), db="txregnet")
cd = TxRegInfra::basicColData
rme0 = RaggedMongoExpt(con1, colData=cd)
rme0

Here rme0 holds a reference to a MongoDB database, coordinated with the colData component. (The package includes a unit test for correspondence between collection names in the txregnet database and the colData element names.)

Basic motivation

We'll step back for a moment to give a sense of basic motivations. We want to use MongoDB to manage data about eQTL, DnaseI hypersensitive regions and so forth, without curating the related file contents. Here's an illustration of the basic functionality for eQTL:

mycon = mongo(db="txregnet", 
   url=URL_txregInAWS(),   # ATLAS deployment in AWS
   collection="Lung_allpairs_v7_eQTL")
mycon$find(q=rjson::toJSON(list(chr=17)), limit=2)

We'll need different q components for assays of different types, because the internal notation used for chromosomes differs between the assay types. Other aspects of diversified annotation can emerge, and the shim concept helps deliver to the user a more unified interface in the face of this diversity.

The `sbov` function

At present, the main workhorse for retrieving assay results from r class(rme0) instances is sbov, which is an approach to a subsetByOverlaps functionality. We'll illustrate this with extractions from lung-related eQTL, Dnase hotspot, and digital genomic footprinting results.

lname_eqtl = "Lung_allpairs_v7_eQTL"
lname_dhs = "ENCFF001SSA_hg19_HS" # see dnmeta
lname_fp = "fLung_DS14724_hg19_FP"
si17 = GenomeInfoDb::Seqinfo(genome="hg19")["chr17"]
si17n = si17
GenomeInfoDb::seqlevelsStyle(si17n) = "NCBI"
s1 = sbov(rme0[,lname_eqtl], GRanges("17", IRanges(38.06e6, 38.15e6),
    seqinfo=si17n))
s2 = sbov(rme0[,lname_dhs], GRanges("chr17", IRanges(38.06e6, 38.15e6),
   seqinfo=si17))
s3 = sbov(rme0[,lname_fp], GRanges("chr17", IRanges(38.06e6, 38.15e6),
   seqinfo=si17))

In principle we could avoid the seqlevelsStyle manipulation by checking assay type within sbov, but at the moment the user must shoulder this responsibility.

To see more about how to work with sbov outputs, check the main vignette.

Any scripts or data that you put into this service are public.

TxRegInfra documentation built on Nov. 8, 2020, 5:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TxRegInfra
Metadata management for multiomic specification of transcriptional regulatory networks

shims in TxRegInfra: dealing with heterogeneous annotation native to different genomic assay archives
In TxRegInfra: Metadata management for multiomic specification of transcriptional regulatory networks

Overview

Application to RaggedMongoExpt instances

Construction

Basic motivation

The `sbov` function

Try the TxRegInfra package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TxRegInfra Metadata management for multiomic specification of transcriptional regulatory networks

shims in TxRegInfra: dealing with heterogeneous annotation native to different genomic assay archives In TxRegInfra: Metadata management for multiomic specification of transcriptional regulatory networks

Overview

Application to RaggedMongoExpt instances

Construction

Basic motivation

The sbov function

Try the TxRegInfra package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TxRegInfra
Metadata management for multiomic specification of transcriptional regulatory networks

shims in TxRegInfra: dealing with heterogeneous annotation native to different genomic assay archives
In TxRegInfra: Metadata management for multiomic specification of transcriptional regulatory networks

The `sbov` function