knitr::opts_chunk$set( fig.width = 9, fig.height = 9, dpi = 72)

This is an R Markdown document for PoissonSeq analysis of IDEA: Interactive Differential Expression Analyzer. Plots in PoissonSeq analysis module (plotted in R [1] with pheatmap[2] (for heat map), PoissonSeq[3] (for power transformation curve) and ggplot2[4] (for FDR distribution plot)) are presented in HTML file via rmarkdown [5]. For figures of higher resolution, please download from website directly.

Citation: This work is in process of publishing, citation method will be post here as soon as possible. Check out the IDEA website above.

#setwd(tempdir())
load("PoissonseqAnalysis.RData")
#p1 basic information 
     # plist[[1]]#1 exprimental design  2 select paird 3 intererst factor

#p2 power curve 

library(ggplot2)
library(gplots)
library(RColorBrewer)
library(scales)
library(pheatmap)
library(plyr)
library(labeling)
library(stringr)
library(rmarkdown)
#library(S4Vectors)
#library(PoissionSeq)
library(stringr)

Introduction

Count data, as generated by various high-throughput sequencing methods such as RNA-Seq [6, 7], Tag-Seq[8, 9], and ChIP-Seq [10], has been more and more used to represent the abundance of genes/features at RNA/DNA level since read count and abundance are linearly related [7]. Also in RNA-Seq, variation caused by replicate is low, which makes RNA-Seq count data advantageous for differential expressed gene discovery[11]. Differential expression (DE) analysis typically works with following questions: choice of normalization and noise control method [7, [9]; choice of data distribution given numbers of replicates[9]; choice of assessment of statistical significance of DE detection[12].

PoissonSeq[3] is an R package for differential expression analysis for count data. It adopts the non-parametric method to model count data distribution, which typically holds better performance when a relatively large data set is available. A Poisson goodness-of-fit statistic is adopted to normalize the raw count data. Score statistics is employed for statistic testing. A modified permutation plug-in estimate is employed to derive FDR.

In IDEA, PoissonSeq, version 1.1.2, is employed for DE analysis. For more information on PoissonSeq, please refer to the reference [3] and package manual.

Basic Information

Experimental Design

In IDEA, a raw count table and an experimental design table should be inputted. Optionally, experimental design can be one of Standard Comparison, Multi-factors Design and Without Replicates (not recommended). Then a pair of conditions should be selected to carry out DE analysis.
Specifically, PoissonSeq is applicable only for Standard Comparison.
In this case, experimental design was stated as r plist[[1]][1]. Condition r as.character(plist[[1]][[2]])[1] and condition r as.character(plist[[1]][[2]])[2] were selected for differential expression analysis.

Advanced Options

No advanced options are provided in PoissonSeq analysis module.

Analysis Result

Differential Expression Table

A table containing information of all differentially expressed genes is presented with interactive options. Intepretation of all headers is explained in Table 1.
Note that in different packages, same header can have different implication. For example, p-values in DESeq are obtained by Wald test, but in edgeR p-values are obtained by Fisher's exact test.

htmltools::HTML('   
<div align="center">
Table 1: Interpretation of headers of differential expression table in PoissonSeq<br/>
<table cellpadding="5" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000">
        <tr>
            <td style="border-width: medium thin medium 0">&nbsp;Headers</td>
            <td style="border-width: medium thin medium 0">&nbsp;Interpretation</td>
        </tr>
         <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;FeatureID</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Feature identifier</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;tt</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Mean of condition, available for multiple columns</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;Theta</td>
            <td style="border-width: 0 thin thin 0">&nbsp;The score statistics of the genes</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;p.value</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Permutation-based p-values of the genes</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin thin 0">&nbsp;FDR</td>
            <td style="border-width: 0 thin thin 0">&nbsp;Estimated false discovery rate</td>
        </tr>
        <tr>
            <td style="border-width: 0 thin medium 0">&nbsp;logFC</td>
            <td style="border-width: 0 thin medium 0">&nbsp;Estimated log (base 2) fold change of the features, , fold change is defined as counts of Condition2 divided by counts of Condition1</td>
        </tr>
</table>
</div>   
')

Heat Map of Differential Expressed Genes

Heat map can graphically display the differential expression table, and each square (pixel) represents the value of a feature in a sample and colored accordingly. Here, heat map of differential expressed features is plotted via R package pheatmap. Features are arranged in columns (samples) and rows (features) as in the original data matrix. Up-regulated differential expression features are colored red in heat map, while the down-regulated colored green. Hierarchical clustering results of features and samples are shown in dendrogram on the left and upper side of heat map, respectively.
Numbers of features to display as rows, the appearance of dendrogram on both left and upper side, and the appearance of color key are all interactively changeable. The data scaling of heat map can be one of "none", "row", and "column", as chosen by user. The color is scaled by $log_{10}(Normalized Reads Count + 1)$.

In this case, data is centered and scaled in the as.character(plist[[2]][[3]]) direction. For more information on parameter settings, please refer to the manual of package pheatmap (as in References [2]).

wzxhzdk:3
Figure 1 Heat map of differential expressed genes, top `r as.character(plist[[1]][[3]])` DE features with lowest false discover rate (FDR) value displayed

Power Transformation Curve

Considering that counts data can be overdispersed, PoissonSeq defines $\theta$ as a parameter of power transformation that makes overdispersion of data approaches zero. To estimate $\theta$, a natural cubic spline is applied for 10 pairs of $\theta$-1 and $\mu$, so that given $\mu$, the potential overdispersed data can be realistically modeled.

print(plist[[2]])


Figure 2: Power transformation curve in PoissonSeq

FDR Distribution Plot

False discover rate (FDR) distribution plot is used to visualize distribution of FDR. In PoissonSeq, a score statics method is adopted for differential expression test, and a permutation plug-in method for multiple testing. FDR distribution plot uses FDR as x-axis and percentage of different groups of x value as y-axis, and colors significant and not significant groups differently.

In this case, FDR threshold for significance is set as `r as.character(plist[[4]]).

wzxhzdk:5
Figure 3: FDR distribution plot in PoissonSeq

References

1. R Core Team, R: A language and environment for statistical computing, 2014, R Foundation for Statistical Computing: Vienna, Austria.
2. Kolde, R., pheatmap: Pretty Heatmaps, 2013.
3. Li, J., et al., Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 2011: p. kxr031.
4. Wickham, H., ggplot2: elegant graphics for data analysis, 2009, Springer New York.
5. JJ Allaire, J.M., Yihui Xie, Hadley Wickham, Joe Cheng and Jeff Allen, rmarkdown: Dynamic Documents for R, 2014.
6. Nagalakshmi, U., et al., The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008. 320(5881): p. 1344-9.
7. Mortazavi, A., et al., Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008. 5(7): p. 621-8.
8. Morrissy, A.S., et al., Next-generation tag sequencing for cancer gene expression profiling. Genome Res, 2009. 19(10): p. 1825-35.
9. Anders, S. and W. Huber, Differential expression analysis for sequence count data. Genome biol, 2010. 11(10): p. R106.
10. Robertson, G., et al., Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods, 2007. 4(8): p. 651-7.
11. Marioni, J.C., et al., RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 2008. 18(9): p. 1509-17.
12. Robinson, M.D., D.J. McCarthy, and G.K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010. 26(1): p. 139-40.


likelet/IDEA documentation built on Sept. 8, 2020, 2:56 p.m.