In jos4uke/qc4rnaseq: Estimate a posteriori the quality of RNA-seq libraries using count data

\newpage

# treatment of count BBRIC file
count <- paste(opt$count_dir, basename(opt$count), sep = .Platform$file.sep)
bbric.df <- read.table(count, header = TRUE, sep="\t", check.names=FALSE)
nbcoltotal=dim(bbric.df)[2]
nblib=(nbcoltotal - 8)/2
rawvalues = bbric.df[, seq(9, nbcoltotal, by=2)]
colnames(rawvalues) = gsub(".count", "", colnames(rawvalues)) 
rpkmvalues = bbric.df[, seq(10, nbcoltotal, by=2)]
colnames(rpkmvalues) = gsub(".rpkm", "", colnames(rpkmvalues)) 
lograwvalues <- log(rawvalues + 1)
logrpkmvalues <- log(rpkmvalues + 1)

# treatment of design file
if (!is.null(opt$design)) {
  design <- paste(opt$design_dir, basename(opt$design), sep = .Platform$file.sep)
    design.df <- read.table(design, header=TRUE, sep="\t", check.names=FALSE)
    lib_names <- design.df[,1]
    Factor = design.df[,2]
    moda_names = unique (Factor)
    testcol  <- colorRampPalette(c('blue','red', 'green', 'orange'))
    colmoda <- testcol(length(moda_names))
    colorModa <- c()
    for (i in 1:length(moda_names)) {
        moda = moda_names[i]
        colorModa[moda]=colmoda[i]
    }
    colorLib <- c()
    for (i in 1:length(lib_names)) {
        lib = lib_names[i]
        moda = Factor[i]
        collib = colorModa[moda]
        colorLib[lib]=collib
    }
}

Cluster dendrogram representing the hierarchical clustering of libraries (log raw values)

Input Data = RNASeq count expression file.

After a log transformation and a Pearson correlation, data are submitted to an Ascending Hierarchical Clustering (AHC) (Ward).

The aim of AHC is to cluster the libraries, based on their count values.

Expected results :

All the replicates for an experimental condition should cluster together
The different experimental conditions should be well separated

Application :

Estimate the distance between the different libraries
Detect problems of sample inversion

dist.lame.cor <- 1 - cor(lograwvalues, use="pairwise.complete.obs")
hc.lame.cor <- hclust(as.dist(dist.lame.cor))
dendCol <- as.dendrogram(hc.lame.cor)

if (!is.null(opt$design)) {
    labels_colors(dendCol) <- colorModa[Factor][order.dendrogram(dendCol)]
}
plot(dendCol, main= "Cluster Dendrogram (log raw values)")

\newpage

Cluster dendrogram representing the hierarchical clustering of libraries (log rpkm values)

Input Data = RNASeq count expression file.

After a log transformation and a Pearson correlation, data are submitted to an Ascending Hierarchical Clustering (AHC) (Ward).

The aim of AHC is to cluster the libraries, based on their count values.

Expected results :

All the replicates for an experimental condition should cluster together
The different experimental conditions should be well separated

Application :

Estimate the distance between the different libraries
Detect problems of sample inversion

dist.lame.cor <- 1 - cor(logrpkmvalues, use="pairwise.complete.obs")
hc.lame.cor <- hclust(as.dist(dist.lame.cor))
dendCol <- as.dendrogram(hc.lame.cor)

if (!is.null(opt$design)) {
    labels_colors(dendCol) <- colorModa[Factor][order.dendrogram(dendCol)]
}
plot(dendCol, main= "Cluster Dendrogram (log rpkm values)")

\newpage

Barplot representing for each library the number of features (genes / RNA) that have at least 10 reads mapped

For each library, the number of genes that have at least 10 reads mapped, is displayed.

The number of 10 is empirical, and was chosen because we think that it is the minimal number required to consider that a feature is expressed.

Expected results :

All the libraries should have quite the same number of genes with >= 10 reads mapped

Application :

Estimate the number of genes usable / library
Estimate the number of valid genes (Number of features with at least 10 reads mapped, in at least 1 lib)
Check that the mapping parameters and structural annotation file are adequate for all the libraries (including samples that do not come from the reference genome)

vecteursup10 = c()
vecteurunionprov = c()
vecteurall = c()
for (colo in 1:nblib) { vecteursup10 <- append(vecteursup10, length(which(rawvalues[,colo]>9)))}
for (colo in 1:nblib) { vecteurunionprov <- append(vecteurunionprov, which(rawvalues[,colo]>9))}
vecteurunion = unique(vecteurunionprov)
vecteursup10 <- append(vecteursup10, length(vecteurunion))
for (colo in 1:(nblib+1)) { vecteurall <- append(vecteurall, length(rawvalues[,1]))}

if (!is.null(opt$design)) {
    barplot(vecteursup10, names.arg=c(colnames(rawvalues[1:nblib]), "At least in 1 lib"), las=2, cex.names=0.7, col=c(colorLib, "black"), ylim=c(0,max(vecteurall)))
} else {
    barplot(vecteursup10, names.arg=c(colnames(rawvalues[1:nblib]), "At least in 1 lib"), las=2, cex.names=0.7, ylim=c(0,max(vecteurall)))
}

todisplay = data.frame(c(colnames(rawvalues[1:nblib]), "At least in 1 lib"),vecteursup10, vecteurall)
names(todisplay) = c('lib', 'nb features with at least 10 reads', 'nb features total')
todisplay[,1:3]

\newpage

Box-plot (log raw values)

Input Data = RNASeq count expression file.

For each library, box-plots of log (count values) are displayed.

Expected results :

All the libraries should have quite the same distribution

Application :

Ensure that all the libraries have quite the same distribution to initiate a powerful differential expression analysis
Check that an unexpected clustering in the AHC (first graph) is not caused by a heterogeneous distribution between libraries

if (!is.null(opt$design)) {
    boxplot(lograwvalues, las=2, cex.axis=0.7, main="BoxPlot of log (raw counts)", col=colorLib)
} else {
    boxplot(lograwvalues, las=2, cex.axis=0.7, main="BoxPlot of log (raw counts)")
}

\newpage

Box-plot (log rpkm values)

Input Data = RNASeq count expression file.

For each library, box-plots of log (count values) are displayed.

Expected results :

After normalization, all the libraries should have the same distribution

Application :

Ensure that all the libraries have the same distribution before launching the differential expression analysis

if (!is.null(opt$design)) {
    boxplot(logrpkmvalues, las=2, cex.axis=0.7, main="BoxPlot of log (rpkm counts)", col=colorLib)
} else {
    boxplot(logrpkmvalues, las=2, cex.axis=0.7, main="BoxPlot of log (rpkm counts)")
}

\newpage

Summary of raw values

summary(rawvalues)

\newpage

Summary of rpkm values

summary(rpkmvalues)

jos4uke/qc4rnaseq documentation built on May 19, 2019, 8:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jos4uke/qc4rnaseq
Estimate a posteriori the quality of RNA-seq libraries using count data

In jos4uke/qc4rnaseq: Estimate a posteriori the quality of RNA-seq libraries using count data

Cluster dendrogram representing the hierarchical clustering of libraries (log raw values)

Cluster dendrogram representing the hierarchical clustering of libraries (log rpkm values)

Barplot representing for each library the number of features (genes / RNA) that have at least 10 reads mapped

Box-plot (log raw values)

Box-plot (log rpkm values)

Summary of raw values

Summary of rpkm values

R Package Documentation

Browse R Packages

We want your feedback!

jos4uke/qc4rnaseq Estimate a posteriori the quality of RNA-seq libraries using count data

In jos4uke/qc4rnaseq: Estimate a posteriori the quality of RNA-seq libraries using count data

Cluster dendrogram representing the hierarchical clustering of libraries (log raw values)

Cluster dendrogram representing the hierarchical clustering of libraries (log rpkm values)

Barplot representing for each library the number of features (genes / RNA) that have at least 10 reads mapped

Box-plot (log raw values)

Box-plot (log rpkm values)

Summary of raw values

Summary of rpkm values

R Package Documentation

Browse R Packages

We want your feedback!

jos4uke/qc4rnaseq
Estimate a posteriori the quality of RNA-seq libraries using count data