This two different Dogs are not differential DoGs

![Dogs](/Users/axy148/Desktop/Screen Shot 2016-08-15 at 12.09.38 PM.png)

Definition of DoGs

DefinitionDogs

Transcript downstream analysis(Borrow some idea from DoGs, but focusing on the 4.5kb or 5kb downstream region of transcripts)

## Load required library
#suppressPackageStartupMessages(library(DESeq2))
#library(ThreeUTR)
#library(hwriter)
#library(knitr)
#library(printr)

Transcript based

Txs.gene

Get gene downstream count and test

#re.rMAT<-ProcessOutputFilesFrom3UTR(dir.name,input.file.pattern)

Count

DE test for transcript based analysis:

"There is some confusion in the answers to this question that hopefully I can clarify with the three comments below:

  1. kallisto produces estimates of transcript level counts, and therefore to obtain an estimate of the number of reads from a gene the correct thing to do is to sum the estimated counts from the constituent transcripts of that gene. Of note in the language above is the word "estimate", which is necessary because in many cases reads cannot be mapped uniquely to genes. However insofar as obtaining a good estimate, the approach of kallisto (and before it Cufflinks, RSEM, eXpress and other "transcript level quantification tools") is superior to naïve "counting" approaches for estimating the number of reads originating from a gene. This point has been argued in many papers; among my own papers it is most clearly explained and demonstrated in Trapnell et al. 2013.

  2. Although estimated counts for a gene can be obtained by summing the estimated counts of the constituent transcripts from tools such as kallisto, and the resulting numbers can be rounded to produce integers that are of the correct format for tools such as DESeq, the numbers produced by such an approach do not satisfy the distributional assumptions made in DESeq and related tools. For example, in DESeq2, counts are modeled "as following a negative binomial distribution". This assumption is not valid when summing estimated counts of transcripts to obtain gene level counts, hence the justified concern of Michael Love that plugging in sums of estimated transcript counts could be problematic for DESeq2. In fact, even the estimated transcript counts themselves are not negative binomial distributed, and therefore also those are not appropriate for plugging into DESeq2. His concern is equally valid with many other "count based" differential expression tools.

  3. Fortunately there is a solution for performing valid statistical testing of differential abundance of individual transcripts, namely the method implemented in sleuth. The approach is described here. To test for differential abundance of genes, one must first address the question of what that means. E.g. is a gene differential if at least one isoform is? or if all the isoforms are? The tests of sleuth are performed at the granularity of transcripts, allowing for downstream analysis that can capture the varied questions that might make biological sense in specific contexts.

In summary, please do not plug in rounded estimates of gene counts from kallisto into DESeq2 and other tools. While it is technically possible, it is not statistically advisable. Instead, you should use tools that make valid distributional assumptions about the estimates."



aiminy/3UTR-Seq documentation built on May 10, 2019, 7:36 a.m.