knitr::opts_chunk$set( fig.width = 9, fig.height = 9, dpi = 72)

This is an R Markdown document for Combination Analysis of IDEA: Interactive Differential Expression Analyzer. The following plots are plotted in R [1] with ggplot2[2] (for histogram) and VennDiagram[3] (for Venn diagram)). This analysis report is presented in HTML via rmarkdown [4].

Citation: This work is in process of publishing, citation method will be post here as soon as possible. Check out the IDEA website above.

#setwd(tempdir())
load("combinationAnalysis.RData")
#p1 basic information  
      #exprimental design plist[[1]][1]
      #select paird plist[[1]][[2]]
      #SAMseq resample number plist[[1]][3]
      #fdr cutoff plist[[1]][4]
    # #heatmap genenumber plist[[1]][5]
library(ggplot2)
library(gplots)
library(RColorBrewer)
library(scales)
library(pheatmap)
library(plyr)
library(labeling)
library(stringr)
library(rmarkdown)
#library(S4Vectors)
library(samr)
library(stringr)

Introduction

Count data, as generated by various high-throughput sequencing methods such as RNA-Seq[5, 6], Tag-Seq[7, 8], and ChIP-Seq[9], has been more and more used to represent the abundance of genes/features at RNA/DNA level since read count and abundance are linearly related [6]. Also in RNA-Seq, variation caused by replicate is low, which makes RNA-Seq count data advantageous for differential expressed gene discovery[10]. Differential expression (DE) analysis typically works with following questions: choice of normalization and noise control method[6, 8]; choice of data distribution given numbers of replicates[8]; choice of assessment of statistical significance of DE detection[11].

Basic Information

Experiment Design

In combination analysis, experiment design is set as standard procedure of package-specific analysis, which is defined in "New" module before analysis starts. Note that some packages are not applicable for a certain experiment type (Table 1). in Multi-factor Design, factor of interest is defined as first column of counts table.

In this case, experiment type was stated as r as.character(plist[[1]][1]). And condition r as.character(plist[[1]][[2]])[1] and condition r as.character(plist[[1]][[2]])[2] were selected for differential expression analysis.

htmltools::HTML('  
<div align="center">
Table 1: Package availability for different experiment design types<br/>
<table cellpadding="5" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000">
        <tr>
            <td style="border-width: medium thin medium 0">&nbsp;Package</td>
            <td style="border-width: medium thin medium 0">&nbsp;Standard Comparison</td>
            <td style="border-width: medium thin medium 0">&nbsp;Multi-factor Design</td>
            <td style="border-width: medium thin medium 0">&nbsp;Without Replicates</td>
        </tr>
         <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;DESeq</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;edgeR</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;NOISeq</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Not applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;PoissonSeq</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Not applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Not applicable</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin medium 0">&nbsp;SAMseq (samr)</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Not applicable</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Not applicable</td>
        </tr>
</table>
</div>   
')

Normalization Method

Since packages typically have different choices of normalization methods, to combine the results across packages, counts data inputted by user is first normalized using method chosen in Data Exploration (Table 2) before being analyzed by packages respectively.
In this case, normalization method is r as.character(plist[[1]][[3]]).

Table 2: Normalization methods in Data Exploration
 Method  Abbreviation  Summary
 The Reads per Kilobase per Million Reads (default)  RPKM  Counts per kilobase per million mapped reads or total number of reads in library calculated
 The Trimmed Mean of M values  TMM  Weight taken from delta method on binomial data, then trimmed weighted means calculated
 The Upper Quartile  UQ  Features that are zero in all library removed, scale factor calculated from a upper quartile of counts for each library
 None  None  All scaling factors set to 1

Advanced Option: Selected Packages

In IDEA, five R/Bioconductor packages are available for combination analysis of dfferential expression (DE) features, including DESeq2, edgeR, NOISeq, PoissonSeq and SAMseq.
Users can select their favorite combination of different packages to perform the combination analysis.
In this case, the following packages were selected for combination analysis: r as.character(plist[[1]][4]).

Advanced Option: Rank Aggregation Method

Several rank lists are generated by different packages after analysis. To provide an integrated view of differentially expressed features, a Score of rank aggregation result is provided. Rank Aggregation Method, that is, calculation method of Score can be selected and meanings of parameters are listed below.

Table 3: Rank Aggregation Methods in Combination Analysis
 Parameter of Rank Aggregation Method  Meaning
 RRA (default)  Defined as result of Robust Rank Aggregation [14]
 min  Defined as minimum value of ranks
 geom.mean  Defined as geometric mean value of ranks
 mean  Defined as mean value of ranks
 median  Defined as median value of ranks
 Stuart  Defined as means of a variety of statistical tests, as in Ref [15]

Analysis Result

Differentially Expressed Features Identified by Packages

A histogram is adopted to visualize total number of differentially expressed features identified by each package.

wzxhzdk:3
Figure 1: Histogram of differential expression total numbers of features features by each package

Venn of DE Features Identified by Packages

The Venn diagram visualizes the overlapping differential expression (DE) features identified by each package. In the diagram, each oval represents a set of detected DE features in a certain package, illustrated by notes beside the oval. And the overlapping areas shown in the diagram represent all possible logical relations of DE features identification by differential packages. Digits are plotted in such areas, indicating number of DE features that follow corresponding logical relation. Total numbers of packages can be interactively changed as demonstrated in Basic Introduction of combination analysis above.

wzxhzdk:4
Figure 2: Venn diagram of differential expression features analysis in each selected packages

Feature Weight Table

Identification details of each feature is shown in the feature weight table with interactive sorting options built-in at headers. A feature weight table in .csv format is available for download from web page.
To better evaluate levels of DE features detected by different packages, a Recommendation Value (R-value) is introduced. To integrate these rank list, the method of robust rank aggregation (RRA) is applied [14]. A Score is presented as results of rank list integration via corresponding R package RobustRankAggreg.
Identification status of features in each selected packages and statistic values are listed in columns. The interpretation of headers is shown in the Table 3.

Table 4: Interpretation of headers in feature weight table
 Headers  Interpretation
 FeatureID  Feature identifier
 Package Name  (DESeq, edgeR, NOISeq, PoissonSeq, and/or SAMseq)
Identification status of features in corresponding package; in .csv file, identified features are tagged by "Positive" and unidentified features by "Negative";
on web pages, it is tagged by "Check" and "Cross" marks
 Mean  Mean of feature expression
 LogFC  Logarithm (base 2) of the fold change
 Rankmean  Mean rank among the selected packages for combination analysis
 Score  Intergration score of rank lists of DE features by robust rank aggregation (RRA)

References

1. R Core Team, R: A language and environment for statistical computing, 2014, R Foundation for Statistical Computing: Vienna, Austria.
2. Wickham, H., ggplot2: elegant graphics for data analysis, 2009, Springer New York.
3. Chen, H., VennDiagram: Generate high-resolution Venn and Euler plots, 2014.
4. JJ Allaire, J.M., Yihui Xie, Hadley Wickham, Joe Cheng and Jeff Allen, rmarkdown: Dynamic Documents for R, 2014.
5. Nagalakshmi, U., et al., The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008. 320(5881): p. 1344-9.
6. Mortazavi, A., et al., Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008. 5(7): p. 621-8.
7. Morrissy, A.S., et al., Next-generation tag sequencing for cancer gene expression profiling. Genome Res, 2009. 19(10): p. 1825-35.
8. Anders, S. and W. Huber, Differential expression analysis for sequence count data. Genome biol, 2010. 11(10): p. R106.
9. Robertson, G., et al., Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods, 2007. 4(8): p. 651-7.
10. Marioni, J.C., et al., RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 2008. 18(9): p. 1509-17.
11. Robinson, M.D., D.J. McCarthy, and G.K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010. 26(1): p. 139-40.
12. Robinson, M.D. and A. Oshlack, A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol, 2010. 11(3): p. R25.
13. Bullard, J.H., et al., Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics, 2010. 11: p. 94.
14. Kolde, R., et al., Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 2012. 28(4): p. 573-80.
15. Stuart, J.M., et al., A gene-coexpression network for global discovery of conserved genetic modules. Science, 2003. 302(5643): p. 249-55.


likelet/IDEA documentation built on Sept. 8, 2020, 2:56 p.m.