\newpage
# treatment of generic file count <- paste(opt$count_dir, basename(opt$count), sep = .Platform$file.sep) generic.df <- read.table(count, header = TRUE, sep="\t", check.names=FALSE) nbcol=dim(generic.df)[2] nblib=nbcol - 1 values <- generic.df[,2:nbcol] logvalues <- log(generic.df[,2:nbcol] + 1) # treatment of design file if (!is.null(opt$design)) { design <- paste(opt$design_dir, basename(opt$design), sep = .Platform$file.sep) design.df <- read.table(design, header=TRUE, sep="\t", check.names=FALSE) lib_names <- design.df[,1] Factor = design.df[,2] moda_names = unique (Factor) testcol <- colorRampPalette(c('blue','red', 'green', 'orange')) colmoda <- testcol(length(moda_names)) colorModa <- c() for (i in 1:length(moda_names)) { moda = moda_names[i] colorModa[moda]=colmoda[i] } colorLib <- c() for (i in 1:length(lib_names)) { lib = lib_names[i] moda = Factor[i] collib = colorModa[moda] colorLib[lib]=collib } }
Input Data = RNASeq count expression file.
After a log transformation and a Pearson correlation, data are submitted to an Ascending Hierarchical Clustering (AHC) (Ward).
The aim of AHC is to cluster the libraries, based on their count values.
Expected results :
All the replicates for an experimental condition should cluster together
The different experimental conditions should be well separated
Application :
Estimate the distance between the different libraries
Detect problems of sample inversion
dist.lame.cor <- 1 - cor(logvalues, use="pairwise.complete.obs") hc.lame.cor <- hclust(as.dist(dist.lame.cor)) dendCol <- as.dendrogram(hc.lame.cor) if (!is.null(opt$design)) { labels_colors(dendCol) <- colorModa[Factor][order.dendrogram(dendCol)] } plot(dendCol, main= "Cluster Dendrogram (log values)")
\newpage
For each library, the number of genes that have at least 10 reads mapped, is displayed.
The number of 10 is empirical, and was chosen because we think that it is the minimal number required to consider that a feature is expressed.
Expected results :
Application :
Estimate the number of genes usable / library
Estimate the number of valid genes (Number of features with at least 10 reads mapped, in at least 1 lib)
Check that the mapping parameters and structural annotation file are adequate for all the libraries (including samples that do not come from the reference genome)
vecteursup10 = c() vecteurunionprov = c() vecteurall = c() for (colo in 1:nblib) { vecteursup10 <- append(vecteursup10, length(which(values[,colo]>9)))} for (colo in 1:nblib) { vecteurunionprov <- append(vecteurunionprov, which(values[,colo]>9))} vecteurunion = unique(vecteurunionprov) vecteursup10 <- append(vecteursup10, length(vecteurunion)) for (colo in 1:nbcol) { vecteurall <- append(vecteurall, length(values[,1]))} if (!is.null(opt$design)) { barplot(vecteursup10, names.arg=c(colnames(values[1:nblib]), "At least in 1 lib"), las=2, cex.names=0.7, col=c(colorLib, "black"), ylim=c(0,max(vecteurall))) } else { barplot(vecteursup10, names.arg=c(colnames(values[1:nblib]), "At least in 1 lib"), las=2, cex.names=0.7, ylim=c(0,max(vecteurall))) } todisplay = data.frame(c(colnames(values[1:nblib]), "At least in 1 lib"),vecteursup10, vecteurall) names(todisplay) = c('lib', 'nb features with at least 10 reads', 'nb features total') todisplay[,1:3] ``` \newpage Box-plot (log values) ========================= Input Data = RNASeq count expression file. For each library, box-plots of log (count values) are displayed. Expected results : - All the libraries should have quite the same distribution Application : - Ensure that all the libraries have quite the same distribution to initiate a powerful differential expression analysis - Check that an unexpected clustering in the AHC (first graph) is not caused by a heterogeneous distribution between libraries ```r if (!is.null(opt$design)) { boxplot(logvalues, cex.axis=0.7, las=2, main="BoxPlot of log(count)", col=colorLib) } else { boxplot(logvalues, cex.axis=0.7, las=2, main="BoxPlot of log(count)") }
\newpage
summary(values)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.