knitr::opts_chunk$set( fig.width = 9, fig.height = 9, dpi = 72)
This is an R Markdown document for Combination Analysis of IDEA: Interactive Differential Expression Analyzer. The following plots are plotted in R [1] with ggplot2[2] (for histogram) and VennDiagram[3] (for Venn diagram)). This analysis report is presented in HTML via rmarkdown [4].
Citation: This work is in process of publishing, citation method will be post here as soon as possible. Check out the IDEA website above.
#setwd(tempdir()) load("combinationAnalysis.RData") #p1 basic information #exprimental design plist[[1]][1] #select paird plist[[1]][[2]] #SAMseq resample number plist[[1]][3] #fdr cutoff plist[[1]][4] # #heatmap genenumber plist[[1]][5] library(ggplot2) library(gplots) library(RColorBrewer) library(scales) library(pheatmap) library(plyr) library(labeling) library(stringr) library(rmarkdown) #library(S4Vectors) library(samr) library(stringr)
Count data, as generated by various high-throughput sequencing methods such as RNA-Seq[5, 6], Tag-Seq[7, 8], and ChIP-Seq[9], has been more and more used to represent the abundance of genes/features at RNA/DNA level since read count and abundance are linearly related [6]. Also in RNA-Seq, variation caused by replicate is low, which makes RNA-Seq count data advantageous for differential expressed gene discovery[10]. Differential expression (DE) analysis typically works with following questions: choice of normalization and noise control method[6, 8]; choice of data distribution given numbers of replicates[8]; choice of assessment of statistical significance of DE detection[11].
In combination analysis, experiment design is set as standard procedure of package-specific analysis, which is defined in "New" module before analysis starts. Note that some packages are not applicable for a certain experiment type (Table 1). in Multi-factor Design, factor of interest is defined as first column of counts table.
In this case, experiment type was stated as r as.character(plist[[1]][1])
. And condition r as.character(plist[[1]][[2]])[1]
and condition r as.character(plist[[1]][[2]])[2]
were selected for differential expression analysis.
htmltools::HTML(' <div align="center"> Table 1: Package availability for different experiment design types<br/> <table cellpadding="5" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000"> <tr> <td style="border-width: medium thin medium 0"> Package</td> <td style="border-width: medium thin medium 0"> Standard Comparison</td> <td style="border-width: medium thin medium 0"> Multi-factor Design</td> <td style="border-width: medium thin medium 0"> Without Replicates</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> DESeq</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Applicable</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> edgeR</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Applicable</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> NOISeq</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Not applicable</td> <td style="border-width: 0 thin thin 0"> Applicable</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> PoissonSeq</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Not applicable</td> <td style="border-width: 0 thin thin 0"> Not applicable</td> </tr> <tr> <td style="border-width: 0 thin medium 0"> SAMseq (samr)</td> <td style="border-width: 0 thin thin 0"> Applicable</td> <td style="border-width: 0 thin thin 0"> Not applicable</td> <td style="border-width: 0 thin thin 0"> Not applicable</td> </tr> </table> </div> ')
Since packages typically have different choices of normalization methods, to combine the results across packages, counts data inputted by user is first normalized using method chosen in Data Exploration (Table 2) before being analyzed by packages respectively.
In this case, normalization method is r as.character(plist[[1]][[3]])
.
Method | Abbreviation | Summary |
The Reads per Kilobase per Million Reads (default) | RPKM | Counts per kilobase per million mapped reads or total number of reads in library calculated |
The Trimmed Mean of M values | TMM | Weight taken from delta method on binomial data, then trimmed weighted means calculated |
The Upper Quartile | UQ | Features that are zero in all library removed, scale factor calculated from a upper quartile of counts for each library |
None | None | All scaling factors set to 1 |
In IDEA, five R/Bioconductor packages are available for combination analysis of dfferential expression (DE) features, including DESeq2, edgeR, NOISeq, PoissonSeq and SAMseq.
Users can select their favorite combination of different packages to perform the combination analysis.
In this case, the following packages were selected for combination analysis: r as.character(plist[[1]][4])
.
Several rank lists are generated by different packages after analysis. To provide an integrated view of differentially expressed features, a Score of rank aggregation result is provided. Rank Aggregation Method, that is, calculation method of Score can be selected and meanings of parameters are listed below.
Parameter of Rank Aggregation Method | Meaning |
RRA (default) | Defined as result of Robust Rank Aggregation [14] |
min | Defined as minimum value of ranks |
geom.mean | Defined as geometric mean value of ranks |
mean | Defined as mean value of ranks |
median | Defined as median value of ranks |
Stuart | Defined as means of a variety of statistical tests, as in Ref [15] |
A histogram is adopted to visualize total number of differentially expressed features identified by each package.
The Venn diagram visualizes the overlapping differential expression (DE) features identified by each package. In the diagram, each oval represents a set of detected DE features in a certain package, illustrated by notes beside the oval. And the overlapping areas shown in the diagram represent all possible logical relations of DE features identification by differential packages. Digits are plotted in such areas, indicating number of DE features that follow corresponding logical relation. Total numbers of packages can be interactively changed as demonstrated in Basic Introduction of combination analysis above.
Identification details of each feature is shown in the feature weight table with interactive sorting options built-in at headers. A feature weight table in .csv format is available for download from web page.
To better evaluate levels of DE features detected by different packages, a Recommendation Value (R-value) is introduced. To integrate these rank list, the method of robust rank aggregation (RRA) is applied [14]. A Score is presented as results of rank list integration via corresponding R package RobustRankAggreg
.
Identification status of features in each selected packages and statistic values are listed in columns. The interpretation of headers is shown in the Table 3.
Headers | Interpretation |
FeatureID | Feature identifier |
Package Name | (DESeq, edgeR, NOISeq, PoissonSeq, and/or SAMseq) Identification status of features in corresponding package; in .csv file, identified features are tagged by "Positive" and unidentified features by "Negative"; on web pages, it is tagged by "Check" and "Cross" marks |
Mean | Mean of feature expression |
LogFC | Logarithm (base 2) of the fold change |
Rankmean | Mean rank among the selected packages for combination analysis |
Score | Intergration score of rank lists of DE features by robust rank aggregation (RRA) |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.