bwToGeneExp: Quantifies gene expression measured by RNA-Seq

Description Usage Arguments Details Value Examples

Description

The function quantifies gene expression as a sum of exon expressions quantified over pre-defined exon regions using signal from RNA-Seq tracks (bigwig files) and returns a GRanges object with TSS location and corresponding gene expression levels quantified over a set of samples.

Usage

1
2
bwToGeneExp(exons, target, sampleIDs = NULL, libStrand = NULL,
  summaryOperation = "mean", mc.cores = 1, normalize = NULL)

Arguments

exons

A GInteractions object with Anchor1 representing the exon locations, and Anchor2 represents location of the corresponding gene. A meta-columns with following info should be present: 1) name (character)- ENSEMBL or other gene identifier; 2) name2 (character, optional) - 2nd gene identifier. Optionally, GRanges object can be used as input object as well. It should contain exon regions over which the expression will be calculated. For this object, a meta-columns with following info are necessary: 1) reg - GRanges object - gene location, or TSS location, 2) name (character) - ENSEMBL or other gene identifier; 3) name2(character,optional) - 2nd gene identifier. It is strongly suggested to adjust seqlengths of this object to be equal to the seqlenghts Hsapiens from the appropriate package (BSgenome.Hsapiens.UCSC.hg19 or whatever needed version).

target

a named list of RNA-Seq BigWig files. Names correspond to the unique sample ids/names. Stranded and unstranded libraries allowed. BUT!!! It is crucial that forward and reverse RNA-Seq libraries are listed in a row (eg one on top of each other)

sampleIDs

NULL (default). A vector of unique sample ids/names(.bw files), ordered as the bigwig files are ordered. When NULL basenames of .bw files is used as a unique sample ids/names.

libStrand

a vector of "*","+,"-" (default NULL) which needs to be entered as an argument.This vector provides info about the order of RNA-Seq libraries based on their strandness: "+" corresponds to forward/positive RNA-Seq bigwig files; "-" corresponds to reverse/negative RNA-Seq bigwig files and "*" is unstranded library. When all libraries are unstranded then the vector should contain a list of "*" with the lenght equal to the number of analyzed RNA-Seq libraries (eg bigwig files). If libStrand=NULL than function will do it automatically, eg create a vector of "*".It is crucial that stranded RNA-Seq libraries are listed in a row (eg one on top of each other)

summaryOperation

"mean"(default). An argument for ScoreMatrixBin that is in the nutshell of quantifying exon expression across pre-defined exon regions. This designates which summary operation should be used over the regions. Currently, only mean is available, but "median" or "sum" will be implemented in the future.

mc.cores

(def:1) Define the number of cores to use; at most how many child processes will be run simultaneously using mclapply from parallel package. Parallelization requires at least two cores.

normalize

NULL(default). Optional "quantile" and "ratio" If set to "quantile" activity measures are quantile normalized as implemented in normalize.quantiles and returned ; if set to "ratio" then "median ratio method" implemented as estimateSizeFactorsForMatrix is used to normalize activity measures.

Details

For a gene of interest and across all samples (.bw files) individaully,the expression is firstly calculated for all exon regions which correspond to the gene of interest. Then, per gene exon expression scores are summed together and divided by a total exon length as followed: Normalization is runned on the level of gene expression. Currently, the relevant bigWig files are required to calculate activity. This function might be extended to work with BAM files in the future. RNA-Seq .bw files can originate from stranded,unstranded or mixed libraries.

Value

a GRanges object where its meta-columns correspond to quantified gene expression across cell type or condition. GRanges correspond to the TSS location of the analyzed gene. Additionally, each TSS contains following metadata:a "name" and "name2" columns for unique id or name/symbol for the gene which the TSS is associated with. One is Ensembl id and the other could be used for the gene symbol. Other metadata column names should represent sample names/ids and should match the GRanges object provided via regActivity argument.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#INPUT1 DEFINING .BW FILES:


test.bw <- system.file("extdata", "test.bw",package = "reg2gene")
test2.bw <- system.file("extdata", "test2.bw",package = "reg2gene")

#INPUT2 DEFINING EXONS:
exons <- GRanges(c(rep("chr1",2),"chr2",rep("chr1",3)),
                      IRanges(c(1,7,9,15,1,21),c(4,8,14,20,4,25)),
                                            c(rep("+",3),rep("-",3)))
 exons$reg <-  exons[c(1,1,3,5,5,5)]
 exons$name2 <- exons$name <- paste0("TEST_Reg",c(1,1,3,5,5,5))
 bwToGeneExp(exons = exons,target = c(test.bw,test2.bw),
         sampleIDs=c("CellType1","CellType2"))
 
 # OUTPUT bwToGeneExp():                                                                                                                       
bwToGeneExp(exons = exons,target = c(test.bw,test2.bw))

# adding different sample IDs

bwToGeneExp(exons = exons,target = c(test.bw,test2.bw),
            sampleIDs=c("CellType1","CellType2"))

# if exons input is GInteractions class object,the same output is obtained


exonsGI= GInteractions(exons,exons$reg)
   exonsGI$name=exons$name
   exonsGI$name2=exons$name2
   
bwToGeneExp(exons = exonsGI,target = c(test.bw,test2.bw))

BIMSBbioinfo/reg2gene documentation built on May 3, 2019, 6:42 p.m.