startQuest("Synthetic transcript")
h1("Interlude")
In this chapter, we will shortly go back to computational optimisation.
library(RnaSeqTutorial) bamfiles <- dir(file.path(extdata(),"BAM"), pattern="*.bam$",full.names=TRUE) bamFileList <- BamFileList(bamfiles,yieldSize=10^6)
As you have been introduced to the GenomicAlignments
functionalities to find count or summarize overlaps between reads and
annotations, we can refine our prefered function. We had left it as:
gAlns <- mclapply(bamFileList,function(bamFile){ open(bamFile) gAln <- GAlignments() while(length(chunk <- readGAlignments(bamFile))){ gAln <- c(gAln,chunk) } close(bamFile) return(gAln) })
Using our synthetic transcript annotation:
Ptrichocarpa_v3.0_210_synthetic_transcripts.rda
,
implement the count by chunks.
data("Ptrichocarpa_v3.0_210_synthetic_transcripts") count.list <- mclapply(bamFileList,function(bamFile,annot){ open(bamFile) counts <- vector(mode="integer",length=length(annot)) while(length(chunk <- readGAlignments(bamFile))){ counts <- counts + assays(summarizeOverlaps(annot, chunk, mode="Union", ignore.strand=TRUE))$counts } close(bamFile) return(counts) },syntheticTrx[syntheticTrx$type == "exon"])
This gives us a list of counts per sample, to get a count matrix do:
count.table <- do.call("cbind",count.list) head(count.table[rowSums(count.table)>0,])
Such a count table
object is the minimal input that downstream analysis softwares -
e.g.: DESeq2
, edgeR
, uses.
A similar function to this is probably all you"ll need to process your read and get a count table from a standard Illumina based RNA-Seq experiment. However, you might want more flexibility for you projects and certainly Bioconductor offer the possibilities to do that; examples of which are given in the next chapter.
quest(12)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.