alpine: bias corrected transcript abundance estimation

Share:

Description

alpine is a package for estimating and visualizing many forms of sample-specific biases that can arise in RNA-seq, including fragment length distribution, positional bias on the transcript, read start bias (random hexamer priming), and fragment GC content (amplification). It also offers bias-corrected estimates of transcript abundance (FPKM). It is currently designed for un-stranded paired-end RNA-seq data.

Details

See the package vignette for a detailed workflow.

The main functions in this package are:

  1. buildFragtypes - build out features for fragment types from exons of a single gene (GRanges)

  2. fitBiasModels - fit parameters for one or more bias models over a set of ~100 medium to highly expressed single isoform genes (GRangesList)

  3. estimateAbundance - given a set of genome alignments (BAM files) and a set of isoforms of a gene (GRangesList), estimate the transcript abundances for these isoforms (FPKM) for various bias models

  4. extractAlpine - given a list of output from estimateAbundance, compile an FPKM matrix across transcripts and samples

  5. predictCoverage - given the exons of a single gene (GRanges) predict the coverage for a set of samples given fitted bias parameters and compute the observed coverage

Some helper functions for preparing gene objects:

  1. splitGenesAcrossChroms - split apart "genes" where isoforms are on different chromosomes

  2. splitLongGenes - split apart "genes" which cover a suspiciously large range, e.g. 1 Mb

  3. mergeGenes - merge overlapping isoforms into new "genes"

Some other assorted helper functions:

  1. normalizeDESeq - an across-sample normalization for FPKM matrices

  2. getFragmentWidths - return a vector estimated fragment lengths given a set of exons for a single gene (GRanges) and a BAM file

  3. getReadLength - return the read length of the first read across BAM files

The plotting functions are:

  1. plotGC - plot the fragment GC bias curves

  2. plotFragLen - plot the framgent length distributions

  3. plotRelPos - plot the positional bias (5' to 3')

  4. plotOrder0, plotOrder1, plotOrder2 - plot the read start bias terms

  5. plotGRL - a simple function for visualizing GRangesList objects

Author(s)

Michael Love

References

Michael I Love, John B Hogenesch, Rafael A Irizarry: Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation Posted to bioRxiv August 2015, http://biorxiv.org/content/early/2015/08/28/025767