This is an R Markdown document for NOISeq analysis of IDEA: Interactive Differential Expression Analyzer. Plots in NOISeq analysis module (plotted in R [1] with pheatmap[2] (for heat map) and ggplot2[3] (for probability distribution plot with q-value)) are presented in HTML file via rmarkdown [4]. For figures of higher resolution, please download from website directly.
Citation: This work is in process of publishing, citation method will be post here as soon as possible. Check out the IDEA website above.

knitr::opts_chunk$set(fig.width = 9, fig.height = 9, dpi = 72)
#setwd(tempdir())
load("NOIseqAnalysis.RData")
#p1 basic information 
      #exprimental design  plist[[1]][1]
      #select paird plist[[1]][[2]]
#NOISeq Normalized Method plist[[1]][[3]]
      #qvalue threshold plist[[1]][[4]]
##heatmap top nunmber plist[[1]][[5]]

library(ggplot2)
library(gplots)
library(RColorBrewer)
library(scales)
library(pheatmap)
library(plyr)
library(labeling)
library(stringr)
# library(NOISeq)
library(rmarkdown)
#library(S4Vectors)
library(stringr)

Introduction

Count data, as generated by various high-throughput sequencing methods such as RNA-Seq [5, 6], Tag-Seq [7, 8], and ChIP-Seq[9], has been more and more used to represent the abundance of genes/features at RNA/DNA level since read count and abundance are linearly related[6]. Also in RNA-Seq, variation caused by replicate is low, which makes RNA-Seq count data advantageous for differential expressed gene discovery[10]. Differential expression (DE) analysis typically works with following questions: choice of normalization and noise control method [6, 8]; choice of data distribution given numbers of replicates[8]; choice of assessment of statistical significance of DE detection[11].

NOISeq[12] is an R/Bioconductor package for differential expression analysis for count data. It adopts the non-parametric method to model count data distribution, which typically holds better performance when a relatively large data set is available. NOISeq is capable of handling data with technical replicates (NOISeq-real), biological replicates (NOISeqBIO) or no replicates (NOISeq-sim), though the last option is not recommended. Several normalization methods are available in NOISeq, including the reads per kilobase per million reads (RPKM) [6], the Trimmed Mean of M (TMM) [13] and the Upper Quartile (UQ) [14], with RPKM as default. In NOISeqBIO, the counts per million reads (CPM) is used to filter features with low counts. For a certain feature, a probability of being differentially expressed is calculated by comparing the log2-ratio of absolute read counts between two conditions against the noise distribution. The feature is considered as differentially expressed when the probability is above a defined threshold (q-value).

In IDEA, NOISeq, version 2.8.0, is employed for DE analysis. For more information on NOISeq, please refer to the reference [12] and package manual.

Basic Information

Experimental Design

In IDEA, a raw count table and an experimental design table should be inputted. Optionally, experimental design can be one of Standard Comparison, Multi-factors Design and Without Replicates (not recommended). Then a pair of conditions should be selected to carry out DE analysis.
Specifically, PoissonSeq is applicable only for Standard Comparison and Without Replicates.
In this case, experimental design was stated as r plist[[1]][1]. Condition r as.character(plist[[1]][[2]])[1] and condition r as.character(plist[[1]][[2]])[2] were selected for differential expression analysis.

Advanced Options

NOISeq provides three methods for normalization: RPKM, TMM and UQ, whose basic information is listed in Table 1. In this case, is adopted as normalization method.
In this case, normalization method was set as r plist[[1]][[3]].

htmltools::HTML('  
<div align="center">
Table 1 Normalization methods in NOISeq<br/>
<table cellpadding="5" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000">
        <tr>
          <td style="border-width: medium thin medium 0">&nbsp;Method</td>
          <td style="border-width: medium thin medium 0">&nbsp;Abbreviation</td>
          <td style="border-width: medium thin medium 0">&nbsp;Summary</td>
        </tr>
        <tr>
          <td style="border-width: 0 thin thin 0">&nbsp;The Reads per Kilobase per Million Reads (default) [<a href="#ref6">6</a>]</td>
          <td style="border-width: 0 thin thin 0">&nbsp;RPKM</td>
          <td style="border-width: 0 thin thin 0">&nbsp;Counts per kilobase per million mapped reads or total number of reads in library calculated</td>
        </tr>
        <tr>
          <td style="border-width: 0 thin thin 0">&nbsp;The Trimmed Mean of M values[<a href="#ref13">13</a>]</td>
          <td style="border-width: 0 thin thin 0">&nbsp;TMM</td>
          <td style="border-width: 0 thin thin 0">&nbsp;Weight taken from delta method on binomial data, then trimmed weighted means calculated</td>
        </tr>
        <tr>
          <td style="border-width: 0 thin medium 0">&nbsp;The Upper Quartile[<a href="#ref14">14</a>]</td>
          <td style="border-width: 0 thin medium 0">&nbsp;UQ</td>
          <td style="border-width: 0 thin medium 0">&nbsp;Features that are zero in all library removed, scale factor calculated from a upper quartile of counts for each library</td>
        </tr>
</table>
</div>
')

Analysis Result

Differential Expression Table

After analysis, a table containing information of all diffientially expressed genes is presented with interactive options. Implication of nouns in header is explained in Table 2.
Note that in different packages, same noun in header can have different implication. For example, p-values in DESeq are obtained by Wald test, but in edgeR p-values are obtained by Fisher's exact test.

Table 2: Interpretation of headers of differential expression table in NOISeq
 Headers  Interpretation
 FeatureID  Feature identifier
 Mean  Mean of condition, available for multiple columns
 Theta  Differential expression statistics
 Prob  Probability of differential expression
 Log2FC  Logarithm (base 2) of the fold change, fold change is defined as counts of Condition1 divided by counts of Condition2

Heat Map of Differential Expressed Genes

Heat map can graphically display the differential expression table, and each square (pixel) represents the value of a feature in a sample and colored accordingly. Here, heat map of differential expressed features is plotted via R package pheatmap. Features are arranged in columns (samples) and rows (features) as in the original data matrix. Up-regulated differential expression features are colored red in heat map, while the down-regulated colored green. Hierarchical clustering results of features and samples are shown in dendrogram on the left and upper side of heat map, respectively.
Numbers of features to display as rows, the appearance of dendrogram on both left and upper side, and the appearance of color key are all interactively changeable. The data scaling of heat map can be one of "none", "row", and "column", as chosen by user. The color is scaled by $log_{10}(Normalized Reads Count + 1)$.

In this case, data is centered and scaled in the r as.character(plist[[2]][[3]]) direction. For more information on parameter settings, please refer to the manual of package pheatmap (as in References [2]).

wzxhzdk:3
Figure 1 Heat map of differential expressed genes, top `r plist[[1]][5]` DE features with lowest false discover rate (FDR) value displayed

Probability Distribution Plot with q-value

Note that in NOISeq, probability is not equivalent to p-value.
According to probability calculation process, summarized above in Introduction of NOISeq, the higher probability is, the more like that the feature is differentially expressed due to changes in experimental condition. As default, q-value is given as a threshold to select DE features and is set as 0.8. For more details, please refer to NOISeq reference [12 and manual.

wzxhzdk:4
Figure 2: Probability distribution plot in NOISeq, with q-value as cutoff of significant/not significant

References

1. R Core Team, R: A language and environment for statistical computing, 2014, R Foundation for Statistical Computing: Vienna, Austria.
2. Kolde, R., pheatmap: Pretty Heatmaps, 2013.
3. Wickham, H., ggplot2: elegant graphics for data analysis, 2009, Springer New York.
4. JJ Allaire, J.M., Yihui Xie, Hadley Wickham, Joe Cheng and Jeff Allen, rmarkdown: Dynamic Documents for R, 2014.
5. Nagalakshmi, U., et al., The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008. 320(5881): p. 1344-9.
6. Mortazavi, A., et al., Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008. 5(7): p. 621-8.
7. Morrissy, A.S., et al., Next-generation tag sequencing for cancer gene expression profiling. Genome Res, 2009. 19(10): p. 1825-35.
8. Anders, S. and W. Huber, Differential expression analysis for sequence count data. Genome biol, 2010. 11(10): p. R106.
9. Robertson, G., et al., Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods, 2007. 4(8): p. 651-7.
10. Marioni, J.C., et al., RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 2008. 18(9): p. 1509-17.
11. Robinson, M.D., D.J. McCarthy, and G.K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010. 26(1): p. 139-40.
12. Tarazona, S., et al., Differential expression in RNA-seq: a matter of depth. Genome research, 2011. 21(12): p. 2213-2223.
13. Robinson, M.D. and A. Oshlack, A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol, 2010. 11(3): p. R25.
14. Bullard, J.H., et al., Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics, 2010. 11: p. 94.


likelet/IDEA documentation built on Sept. 8, 2020, 2:56 p.m.