In uzh/ezRun: An R meta-package for the analysis of Next Generation Sequencing Data

Started on r format(Sys.time(), "%Y-%m-%d %H:%M:%S").

knitr::opts_chunk$set(echo = TRUE)
library(cowplot)
library(plotly)
dataFiles = readRDS("dataFiles.rds")
plots = readRDS("plots.rds")

# ## start of debug
# # require(ezRun)
# data.dir = "/export/local/scratch/test_exceRpt/testData_human"
# output.dir = paste(data.dir,'processed_output',sep='/')
# dataFiles = list.files(output.dir,pattern = '*.txt')
# plots = processSamplesInDir(data.dir = data.dir,output.dir = processed_output)
# ## end of debug
nSamples <- length(unique(plots$`read-length distributions: raw read count`$data$sample))

exceRpt_Result (article) {.tabset}

Data Files

The data files are in tabular text format and can also be opened with a spreadsheet program (e.g. Excel).

When opening with Excel, do make sure that the Gene symbols are loaded into a column formatted as 'text' that prevents conversion of the symbols to dates). See

(https://www.genenames.org/help/importing-gene-symbol-data-into-excel-correctly)

for(each in dataFiles){
    cat("\n")
    cat(paste0("[", each, "](./processed_output/", each, ")"))
    cat("\n")
}

Read-length distributions

## raw
plots$`read-length distributions: raw read count` +
  theme_half_open() + background_grid(minor="none")
## RPM
plots$`read-length distributions: normalised read fraction` +
  theme_half_open() + background_grid(minor="none")

Fractions of aligned reads

## normalized by total input reads
plots$`fraction aligned reads (normalised by # input reads)`
## normalized by adapter-clipped reads
plots$`fraction aligned reads (normalised by # adapter-clipped reads)`
## normalized by non-contaminant reads
plots$`fraction aligned reads (normalised by # non-contaminant reads)`

QC result (ERCC's)

The NIH Extracellular RNA Communication Consortium (ERCC) base their QC metrics on the number of transcriptome reads and the ratio of RNA-annotated reads to the genome reads. The horizontal and vertical lines define QC threshold minima. Therefore, good-quality samples should appear at the upper-right grey area.

## overall
ggplotly(plots$`QC result: overall` + theme_half_open() + background_grid(minor="none"))

## per sample
plots$`QC result: per-sample results`

Biotypes

Reads distributions for the different types of RNA.

## raw
plots$`Biotypes: distributions, raw read-counts` + theme_half_open() + 
  background_grid(minor="none")
## RPM
plots$`Biotypes: distributions, normalised` + theme_half_open() + 
  background_grid(minor="none")
## per sample
ggplotly(plots$`Biotypes: per-sample, normalised`)

miRNA abundance distribution

## raw
plots$`miRNA abundance distributions (raw counts)` + theme_half_open() + 
  background_grid(minor="none")
## RPM
plots$`miRNA abundance distributions (RPM)` + theme_half_open() + 
  background_grid(minor="none")