This is an R Markdown document for Data Exploration of IDEA: Interactive Differential Expression Analyzer. Plots in Analysis module (plotted in R [1] with ggplot2 [2]) are presented in HTML file via rmarkdown [3].
Citation: Update later?
For any bugs or suggestions, please contact us: Qi Zhao(zhaoqi3@mail2.sysu.edu.cn), Rucheng Diao(diaorch@mail2.sysu.edu.cn), Licheng Sun(297495413@qq.com)
setwd(tempdir()) load("DESeqAnalysis.RData") #load("/home//nemo13/Documents/201412/rmarkdown/DEseqAnalysis.RData") #p1 list(experimentaldesign, paired) #p2 getCdsData #p3 getDEseqMAplot() #p4 DEseqVolcanoPlot() #p5 DESeqHeatmapPlotfunction() #p6 p-value distribution library(ggplot2) library(RColorBrewer) library(scales) library(pheatmap) library(plyr) library(labeling) library(stringr) library(rmarkdown) #library(S4Vectors) library(DESeq2)
Differential expression analysis Package Introduction
In IDEA, a raw count table should be uploaded and experiment design be clarified. Optionally, experiment design can be one of Standard Comparison, Multi-factors Design and Without Replicates (not recommended). Then a pair of conditions should be selected to carry out differential expression analysis.
In these case, experiment design was stated as r plist[[1]][1]
. And
r as.character(plist[[1]][[2]])[1]
and r as.character(plist[[1]][[2]])[2]
were selected for differential expression analysis.
After analysis, a table containing information of all diffientially expressed genes is presented with interactive options. Implication of nouns in header is explained in Table 1. Note that in different packages, same noun in header can have different implication. For example, p-values in DESeq are obtained by Wald test, but in edgeR p-values are obtained by Fisher's exact test.
htmltools::HTML(' <div align="center"> <table cellpadding="10" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000"> <tr> <td style="border-width: medium thin medium 0"> Headers</td> <td style="border-width: medium thin medium 0"> Implication</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> FeatureID</td> <td style="border-width: 0 thin thin 0"> Feature identifier</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> baseMean</td> <td style="border-width: 0 thin thin 0"> Mean over all rows</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> log2FoldChange </td> <td style="border-width: 0 thin thin 0"> Logarithm (base 2) of the fold change</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> lfcSE</td> <td style="border-width: 0 thin thin 0"> Standard Error of log2foldchange</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> stat</td> <td style="border-width: 0 thin thin 0"> Wald statistic</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> pvalue</td> <td style="border-width: 0 thin thin 0"> Wald test p-value</td> </tr> <tr> <td style="border-width: 0 thin medium 0"> padj</td> <td style="border-width: 0 thin medium 0"> <em> p</em>-value adjusted for multiple testing with the Benjamini-Hochberg procedure </td> </tr> </table> Table 1: Implication of headers of Differential Expression Table in DESeq2 </div> ')
The variance estimates plot is for checking the result of dispersion estimates adjustment. Specifically, in DESeq2, variance estimation is plotted by executing plotDispEsts()
which is built-in in the package.
The gene-wise estimates are in black, the fitted estimates are in red, and the final estimates are in blue. The outliers of gene-wise estimates are marked with blue circles and they are not shrunk towards the fitted value. The points lying on the bottom indicates they have a dispersion of practically zero or exactly zero.
#insert DESeq2 plotting code here! plotDispEsts(plist[[2]])
Figure 1: Variance Estimation Plot for DESeq2
A MA-Plot can give a quick overview of the distribution of data. The log2–transformed fold change is plotted on the y-axis and the average count (normalized by size factors) is on the x-axis. The false discovery rate (FDR) threshold can be interactively changed, and the genes are colored red if the adjusted p-value is less than the FDR, while other genes are colored black.
In this case, FDR threshold was set as INPUT THRESHOLD HERE!
.
print(plist[[3]])
Figure 2: MA-Plot for DESeq2
Different samples may have different sequencing depth. In order to be comparable, it is necessary to estimate the relative size factors of each sample, and divide the samples by the size factors separately.
Table of normalized size factors shows the normalized size factors of each sample. In the header, Group represents conditions, lib.size represents size of the library, norm.factors is the normalized size factors.
An overview of the number of differential expression genes can be shown in the volcano plot. The log2-transformed fold change is on the x-axis, the y-axis represents the–log10-transformed p-value. The threshold of p-value is INPUT THRESHOLD HERE!
, and fold change threshold is INPUT THRESHOLD HERE!
. Highly differential expressed genes are colored blue, while others are in red.
print(plist[[4]])
Figure 3 Volcano Plot
Heatmap can display the expression values of the features, and every rectangle represents one gene-sample pair. Features are arranged in columns (samples) and rows (features) as in the original data matrix. The color represents $log_{10}(Normalized Reads Count + 1)$.
#heatmapO[[1]] data #heatmapO[[2]] a logical vector with two elements #heatmapO[[3]] scalling method #heatmapO[[3]] legend options heatmapO=plist[[5]] pheatmap(heatmapO[[1]], color=redgreen(75),border_color=NA,cluster_rows=heatmapO[[2]][1],cluster_cols= heatmapO[[2]][2],scale=heatmapO[[3]],legend=heatmapO[[4]])
Figure 4 The Heatmap of Differential Expressed Genes
FDR distribution plot visualizes distribution of FDR in differential expression test. In DESeq2, Wald test is adopted. It uses FDR as x-axis and percentage of different groups of x value as y-axis, and colors significant and not significant groups differently.
Note that FDR distribution plot is not available in NOISeq package, since probabilities in NOISeq are not equivalent to p-values.
print(plist[[6]])
Figure 5 FDR Distribution Plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.