\newpage
# treatment of count BBRIC file count <- paste(opt$count_dir, basename(opt$count), sep = .Platform$file.sep) bbric.df <- read.table(count, header = TRUE, sep="\t", check.names=FALSE) nbcoltotal=dim(bbric.df)[2] nblib=(nbcoltotal - 8)/2 rawvalues = bbric.df[, seq(9, nbcoltotal, by=2)] colnames(rawvalues) = gsub(".count", "", colnames(rawvalues)) rpkmvalues = bbric.df[, seq(10, nbcoltotal, by=2)] colnames(rpkmvalues) = gsub(".rpkm", "", colnames(rpkmvalues)) lograwvalues <- log(rawvalues + 1) logrpkmvalues <- log(rpkmvalues + 1) # treatment of design file if (!is.null(opt$design)) { design <- paste(opt$design_dir, basename(opt$design), sep = .Platform$file.sep) design.df <- read.table(design, header=TRUE, sep="\t", check.names=FALSE) lib_names <- design.df[,1] Factor = design.df[,2] moda_names = unique (Factor) testcol <- colorRampPalette(c('blue','red', 'green', 'orange')) colmoda <- testcol(length(moda_names)) colorModa <- c() for (i in 1:length(moda_names)) { moda = moda_names[i] colorModa[moda]=colmoda[i] } colorLib <- c() for (i in 1:length(lib_names)) { lib = lib_names[i] moda = Factor[i] collib = colorModa[moda] colorLib[lib]=collib } }
Input Data = RNASeq count expression file.
After a log transformation and a Pearson correlation, data are submitted to an Ascending Hierarchical Clustering (AHC) (Ward).
The aim of AHC is to cluster the libraries, based on their count values.
Expected results :
All the replicates for an experimental condition should cluster together
The different experimental conditions should be well separated
Application :
Estimate the distance between the different libraries
Detect problems of sample inversion
dist.lame.cor <- 1 - cor(lograwvalues, use="pairwise.complete.obs") hc.lame.cor <- hclust(as.dist(dist.lame.cor)) dendCol <- as.dendrogram(hc.lame.cor) if (!is.null(opt$design)) { labels_colors(dendCol) <- colorModa[Factor][order.dendrogram(dendCol)] } plot(dendCol, main= "Cluster Dendrogram (log raw values)")
\newpage
Input Data = RNASeq count expression file.
After a log transformation and a Pearson correlation, data are submitted to an Ascending Hierarchical Clustering (AHC) (Ward).
The aim of AHC is to cluster the libraries, based on their count values.
Expected results :
All the replicates for an experimental condition should cluster together
The different experimental conditions should be well separated
Application :
Estimate the distance between the different libraries
Detect problems of sample inversion
dist.lame.cor <- 1 - cor(logrpkmvalues, use="pairwise.complete.obs") hc.lame.cor <- hclust(as.dist(dist.lame.cor)) dendCol <- as.dendrogram(hc.lame.cor) if (!is.null(opt$design)) { labels_colors(dendCol) <- colorModa[Factor][order.dendrogram(dendCol)] } plot(dendCol, main= "Cluster Dendrogram (log rpkm values)")
\newpage
For each library, the number of genes that have at least 10 reads mapped, is displayed.
The number of 10 is empirical, and was chosen because we think that it is the minimal number required to consider that a feature is expressed.
Expected results :
Application :
Estimate the number of genes usable / library
Estimate the number of valid genes (Number of features with at least 10 reads mapped, in at least 1 lib)
Check that the mapping parameters and structural annotation file are adequate for all the libraries (including samples that do not come from the reference genome)
vecteursup10 = c() vecteurunionprov = c() vecteurall = c() for (colo in 1:nblib) { vecteursup10 <- append(vecteursup10, length(which(rawvalues[,colo]>9)))} for (colo in 1:nblib) { vecteurunionprov <- append(vecteurunionprov, which(rawvalues[,colo]>9))} vecteurunion = unique(vecteurunionprov) vecteursup10 <- append(vecteursup10, length(vecteurunion)) for (colo in 1:(nblib+1)) { vecteurall <- append(vecteurall, length(rawvalues[,1]))} if (!is.null(opt$design)) { barplot(vecteursup10, names.arg=c(colnames(rawvalues[1:nblib]), "At least in 1 lib"), las=2, cex.names=0.7, col=c(colorLib, "black"), ylim=c(0,max(vecteurall))) } else { barplot(vecteursup10, names.arg=c(colnames(rawvalues[1:nblib]), "At least in 1 lib"), las=2, cex.names=0.7, ylim=c(0,max(vecteurall))) } todisplay = data.frame(c(colnames(rawvalues[1:nblib]), "At least in 1 lib"),vecteursup10, vecteurall) names(todisplay) = c('lib', 'nb features with at least 10 reads', 'nb features total') todisplay[,1:3]
\newpage
Input Data = RNASeq count expression file.
For each library, box-plots of log (count values) are displayed.
Expected results :
Application :
Ensure that all the libraries have quite the same distribution to initiate a powerful differential expression analysis
Check that an unexpected clustering in the AHC (first graph) is not caused by a heterogeneous distribution between libraries
if (!is.null(opt$design)) { boxplot(lograwvalues, las=2, cex.axis=0.7, main="BoxPlot of log (raw counts)", col=colorLib) } else { boxplot(lograwvalues, las=2, cex.axis=0.7, main="BoxPlot of log (raw counts)") }
\newpage
Input Data = RNASeq count expression file.
For each library, box-plots of log (count values) are displayed.
Expected results :
Application :
if (!is.null(opt$design)) { boxplot(logrpkmvalues, las=2, cex.axis=0.7, main="BoxPlot of log (rpkm counts)", col=colorLib) } else { boxplot(logrpkmvalues, las=2, cex.axis=0.7, main="BoxPlot of log (rpkm counts)") }
\newpage
summary(rawvalues)
\newpage
summary(rpkmvalues)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.