alpine is a package for estimating and visualizing many forms of sample-specific biases that can arise in RNA-seq, including fragment length distribution, positional bias on the transcript, read start bias (random hexamer priming), and fragment GC content (amplification). It also offers bias-corrected estimates of transcript abundance (FPKM). It is currently designed for un-stranded paired-end RNA-seq data.
See the package vignette for a detailed workflow.
The main functions in this package are:
buildFragtypes - build out features for fragment types from exons of a single gene (GRanges)
fitBiasModels - fit parameters for one or more bias models over a set of ~100 medium to highly expressed single isoform genes (GRangesList)
estimateAbundance - given a set of genome alignments (BAM files) and a set of isoforms of a gene (GRangesList), estimate the transcript abundances for these isoforms (FPKM) for various bias models
extractAlpine - given a list of output from
estimateAbundance, compile an FPKM matrix across transcripts and samples
predictCoverage - given the exons of a single gene (GRanges) predict the coverage for a set of samples given fitted bias parameters and compute the observed coverage
Some helper functions for preparing gene objects:
splitGenesAcrossChroms - split apart "genes" where isoforms are on different chromosomes
splitLongGenes - split apart "genes" which cover a suspiciously large range, e.g. 1 Mb
mergeGenes - merge overlapping isoforms into new "genes"
Some other assorted helper functions:
normalizeDESeq - an across-sample normalization for FPKM matrices
getFragmentWidths - return a vector estimated fragment lengths given a set of exons for a single gene (GRanges) and a BAM file
getReadLength - return the read length of the first read across BAM files
The plotting functions are:
plotGC - plot the fragment GC bias curves
plotFragLen - plot the framgent length distributions
plotRelPos - plot the positional bias (5' to 3')
plotOrder0, plotOrder1, plotOrder2 - plot the read start bias terms
plotGRL - a simple function for visualizing GRangesList objects
Michael I Love, John B Hogenesch, Rafael A Irizarry: Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation Posted to bioRxiv August 2015, http://biorxiv.org/content/early/2015/08/28/025767