knitr::opts_chunk$set( fig.width = 9, fig.height = 9, dpi = 72)
This is an R Markdown document for PoissonSeq analysis of IDEA: Interactive Differential Expression Analyzer. Plots in PoissonSeq analysis module (plotted in R [1] with pheatmap[2] (for heat map), PoissonSeq[3] (for power transformation curve) and ggplot2[4] (for FDR distribution plot)) are presented in HTML file via rmarkdown [5]. For figures of higher resolution, please download from website directly.
Citation: This work is in process of publishing, citation method will be post here as soon as possible. Check out the IDEA website above.
#setwd(tempdir()) load("PoissonseqAnalysis.RData") #p1 basic information # plist[[1]]#1 exprimental design 2 select paird 3 intererst factor #p2 power curve library(ggplot2) library(gplots) library(RColorBrewer) library(scales) library(pheatmap) library(plyr) library(labeling) library(stringr) library(rmarkdown) #library(S4Vectors) #library(PoissionSeq) library(stringr)
Count data, as generated by various high-throughput sequencing methods such as RNA-Seq [6, 7], Tag-Seq[8, 9], and ChIP-Seq [10], has been more and more used to represent the abundance of genes/features at RNA/DNA level since read count and abundance are linearly related [7]. Also in RNA-Seq, variation caused by replicate is low, which makes RNA-Seq count data advantageous for differential expressed gene discovery[11]. Differential expression (DE) analysis typically works with following questions: choice of normalization and noise control method [7, [9]; choice of data distribution given numbers of replicates[9]; choice of assessment of statistical significance of DE detection[12].
PoissonSeq[3] is an R package for differential expression analysis for count data. It adopts the non-parametric method to model count data distribution, which typically holds better performance when a relatively large data set is available. A Poisson goodness-of-fit statistic is adopted to normalize the raw count data. Score statistics is employed for statistic testing. A modified permutation plug-in estimate is employed to derive FDR.
In IDEA, PoissonSeq, version 1.1.2, is employed for DE analysis. For more information on PoissonSeq, please refer to the reference [3] and package manual.
In IDEA, a raw count table and an experimental design table should be inputted. Optionally, experimental design can be one of Standard Comparison, Multi-factors Design and Without Replicates (not recommended). Then a pair of conditions should be selected to carry out DE analysis.
Specifically, PoissonSeq is applicable only for Standard Comparison.
In this case, experimental design was stated as r plist[[1]][1]
. Condition r as.character(plist[[1]][[2]])[1]
and condition r as.character(plist[[1]][[2]])[2]
were selected for differential expression analysis.
No advanced options are provided in PoissonSeq analysis module.
A table containing information of all differentially expressed genes is presented with interactive options. Intepretation of all headers is explained in Table 1.
Note that in different packages, same header can have different implication. For example, p-values in DESeq are obtained by Wald test, but in edgeR p-values are obtained by Fisher's exact test.
htmltools::HTML(' <div align="center"> Table 1: Interpretation of headers of differential expression table in PoissonSeq<br/> <table cellpadding="5" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000"> <tr> <td style="border-width: medium thin medium 0"> Headers</td> <td style="border-width: medium thin medium 0"> Interpretation</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> FeatureID</td> <td style="border-width: 0 thin thin 0"> Feature identifier</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> tt</td> <td style="border-width: 0 thin thin 0"> Mean of condition, available for multiple columns</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> Theta</td> <td style="border-width: 0 thin thin 0"> The score statistics of the genes</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> p.value</td> <td style="border-width: 0 thin thin 0"> Permutation-based p-values of the genes</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> FDR</td> <td style="border-width: 0 thin thin 0"> Estimated false discovery rate</td> </tr> <tr> <td style="border-width: 0 thin medium 0"> logFC</td> <td style="border-width: 0 thin medium 0"> Estimated log (base 2) fold change of the features, , fold change is defined as counts of Condition2 divided by counts of Condition1</td> </tr> </table> </div> ')
Heat map can graphically display the differential expression table, and each square (pixel) represents the value of a feature in a sample and colored accordingly. Here, heat map of differential expressed features is plotted via R package pheatmap. Features are arranged in columns (samples) and rows (features) as in the original data matrix. Up-regulated differential expression features are colored red in heat map, while the down-regulated colored green. Hierarchical clustering results of features and samples are shown in dendrogram on the left and upper side of heat map, respectively.
Numbers of features to display as rows, the appearance of dendrogram on both left and upper side, and the appearance of color key are all interactively changeable. The data scaling of heat map can be one of "none", "row", and "column", as chosen by user. The color is scaled by $log_{10}(Normalized Reads Count + 1)$.
In this case, data is centered and scaled in the as.character(plist[[2]][[3]])
direction. For more information on parameter settings, please refer to the manual of package pheatmap (as in References [2]).
Considering that counts data can be overdispersed, PoissonSeq defines $\theta$ as a parameter of power transformation that makes overdispersion of data approaches zero. To estimate $\theta$, a natural cubic spline is applied for 10 pairs of $\theta$-1 and $\mu$, so that given $\mu$, the potential overdispersed data can be realistically modeled.
print(plist[[2]])
Figure 2: Power transformation curve in PoissonSeq
False discover rate (FDR) distribution plot is used to visualize distribution of FDR. In PoissonSeq, a score statics method is adopted for differential expression test, and a permutation plug-in method for multiple testing. FDR distribution plot uses FDR as x-axis and percentage of different groups of x value as y-axis, and colors significant and not significant groups differently.
In this case, FDR threshold for significance is set as `r as.character(plist[[4]]).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.