loci2path
takes query regions in the format of GenomicRanges
. Only the
Genomic Locations (chromosomes, start and end position) will be used. Strand
information and other metadata columns are ignored.
In the demo data, 47 regions associated with Psoriasis disease were downloaded
from immunoBase.org and used as demo query regions.
require(GenomicRanges) bed.file <- system.file("extdata", "query/Psoriasis.BED", package = "loci2path") query.bed <- read.table(bed.file, header=FALSE) colnames(query.bed) <- c("chr","start","end") query.gr <- makeGRangesFromDataFrame(query.bed)
eQTL sets are entities recording 1-to-1 links between eQTL SNPs and genes. eQTL set entity also contains the following information: tissue name for the eQTL study, IDs and genomic ranges for the eQTL SNPs, IDs for the associated genes.
eQTL set can be constructed manually by specifying the corresponding information in each slot.
eQTL set list is a list of multiple eQTL sets, usually collected from different tissues.
Below is an example to construct customized eQTL set and eQTL set list using demo data files. In the demo data folder, three eQTL sets downloaded from GTEx project are included. Due to the large size, each eQTL dataset is down sampled to 3000 records for demostration purpose.
library(loci2path) library(GenomicRanges) brain.file <- system.file("extdata", "eqtl/brain.gtex.txt", package="loci2path") tab <- read.table(brain.file, stringsAsFactors=FALSE, header=TRUE) snp.gr <- GRanges(seqnames=Rle(tab$snp.chr), ranges=IRanges(start=tab$snp.pos, width=1)) brain.eset <- eqtlSet(tissue="brain", eqtlId=tab$snp.id, eqtlRange=snp.gr, gene=as.character(tab$gene.entrez.id)) brain.eset skin.file <- system.file("extdata", "eqtl/skin.gtex.txt", package="loci2path") tab=read.table(skin.file, stringsAsFactors=FALSE, header=TRUE) snp.gr <- GRanges(seqnames=Rle(tab$snp.chr), ranges=IRanges(start=tab$snp.pos, width=1)) skin.eset <- eqtlSet(tissue="skin", eqtlId=tab$snp.id, eqtlRange=snp.gr, gene=as.character(tab$gene.entrez.id)) skin.eset blood.file <- system.file("extdata", "eqtl/blood.gtex.txt", package="loci2path") tab <- read.table(blood.file, stringsAsFactors=FALSE, header=TRUE) snp.gr <- GRanges(seqnames=Rle(tab$snp.chr), ranges=IRanges(start=tab$snp.pos, width=1)) blood.eset <- eqtlSet(tissue="blood", eqtlId=tab$snp.id, eqtlRange=snp.gr, gene=as.character(tab$gene.entrez.id)) blood.eset
eset.list <- list(Brain=brain.eset, Skin=skin.eset, Blood=blood.eset) eset.list
A geneset collection contains a list of gene sets, with each gene set is represented as a vector of member genes. A vector of description is also provided as the metadata slot for each gene set. The total number of gene in the geneset collection is also required to perform the enrichment test. In this tutorial the BIOCARTA pathway collection was downloaded from MSigDB.
biocarta.link.file <- system.file("extdata", "geneSet/biocarta.txt", package="loci2path") biocarta.set.file <- system.file("extdata", "geneSet/biocarta.set.txt", package="loci2path") biocarta.link <- read.delim(biocarta.link.file, header=FALSE, stringsAsFactors=FALSE) set.geneid <- read.table(biocarta.set.file, stringsAsFactors=FALSE) set.geneid <- strsplit(set.geneid[,1], split=",") names(set.geneid) <- biocarta.link[,1] head(biocarta.link) head(set.geneid)
In order to build gene set, we also need to know the total number of genes in order to perform enrichment test. In this study, the total number of gene in MSigDB pathway collection is 31,847[@Liberzon2015]
#build geneSet biocarta <- geneSet( numGene=31847, description=biocarta.link[,2], geneSetList=set.geneid) biocarta
#query from one eQTL set. res.one <- query( query.gr=query.gr, loci=skin.eset, path=biocarta) #enrichment result table res.one$result.table #all the genes associated with eQTLs covered by the query region res.one$cover.gene
#query from one eQTL set. res.esetlist <- query( query.gr=query.gr, loci=eset.list, path=biocarta) #enrichment result table, tissue column added resultTable(res.esetlist) #all the genes associated with eQTLs covered by the query region; #names of the list are tissue names from eqtl set list coveredGene(res.esetlist)
#query from one eQTL set. res.paral <- query( query.gr=query.gr, loci=eset.list, path=biocarta, parallel=TRUE) #should return the same result as res.esetlist
result <- resultTable(res.esetlist)
#all the genes associated with eQTLs covered by the query region res.one$cover.gene #all the genes associated with eQTLs covered by the query region; #names of the list are tissue names from eqtl set list coveredGene(res.esetlist)
tissue.degree <- getTissueDegree(res.esetlist, eset.list) #check gene-tissue mapping for each gene head(tissue.degree$gene.tissue.map) #check degree for each gene head(tissue.degree$gene.tissue.degree) #average tissue degree for the input result table tissue.degree$mean.tissue.degree #add avg. tissue degree to existing result table res.tissue <- data.frame(resultTable(res.esetlist), t.degree=tissue.degree$mean.tissue.degree)
In this case, in the generic function \code{query}, only the argunment \code{loci} need to be provided. This will trigger the query of tissue specificity
#query tissue specificity gr.tissue <- query(query.gr, loci=eset.list) gr.tissue
#extract tissue-pathway matrix mat <- getMat(res.esetlist, test.method="gene") #plot heatmap heatmap.para <- getHeatmap(res.esetlist)
#plot word cloud wc <- getWordcloud(res.esetlist)
#plot p-value distribution of result pval <- getPval(res.esetlist, test.method="gene")
#obtain geneset description from object description <- getPathDescription(res.esetlist, biocarta) head(description)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.