knitr::opts_chunk$set(echo = TRUE)

The purpose of annotation plots is to provide a graphical overview of how were processed the sequences in CAGE libraries. How many were discarded, and why ? Where do align the remaining ones ? The plotAnnot function displays this information as stacked bar plots, with error bars if multiple libraries are grouped together. The default scale is in percentage points, from 0 to 100 %

R commands

The main command to produce annotation plots in smallCAGEqc is called plotAnnot. It takes a table containing the sample metadata, a scope, a title and optionally a factor to group similar plots together.

The scope determines what data is plotted and how it is normalised. The available scopes will be explained with an example plot in a later part of this document, but first, let's see the input in more details.

Here, we will use some of the example data that is distributed in smallCAGEqc. The commands below load the R package and load the example data in a data frame called libs.

library(smallCAGEqc)
libs <- read.table(system.file("extdata/libs-with-all-metadata.tsv", package="smallCAGEqc"))

Sample metadata

The following columns in the metadata table describe the total remaining pairs step after step in the processing.

The following columns describe the number of pairs removed at some step of the processing.

The following columns describe the number of TSS (after proper pairing and deduplication) aligning to known regions in the genome.

The annotation is hierarchical (promoters have priority on exons, etc.), so the sum of the annotation columns above should be be equal to the counts column.

Different types of scopes

Step-by-step extraction

Shows how many pairs are removed by the extraction, cleaning, mapping (proper pairs) and transcript counting steps described above.

plotAnnot(libs, SCOPE="steps", TITLE="steps")

QC report

Pairs are categorised as tag dust, rDNA, spikes, unmapped, non-proper, duplicates and counts, and normalised by the total number of extracted pairs. Non-extracted pairs are ignored.

Compared to "steps", this scope gives more details on the sequences removed at the TagDust and mapping stages of the processing pipeline.

plotAnnot(libs, SCOPE="qc", TITLE="qc")

Annotation of the transcript or tag counts

The unique molecule counts are grouped in annotation categories ("promoter", "exon", "intron" and "intergenic"), as described above.

plotAnnot(libs[libs$counts > 0,], SCOPE="counts", TITLE="counts")

Annotation of the mapped reads

Same as "counts", with the addition of duplicates and non-proper pairs. Therefore the plot represents all the mapped data.

plotAnnot(libs, SCOPE="mapped", TITLE="mapped")

QC including annotations

Pairs are categorised by extraction step and genome annotation.

plotAnnot(libs, SCOPE="all", TITLE="all")

Annotation, normalised by mapped reads

Same as all except that normalisation is relative to the number of mapped reads

plotAnnot(libs, SCOPE="annotation", TITLE="annotation")

Contents of the libs table

libs

R session info

sessionInfo()


charles-plessy/smallCAGEqc documentation built on May 13, 2019, 3:31 p.m.