estimateAbundance: Estimate bias-corrected transcript abundances (FPKM)
In alpine: alpine

Description Usage Arguments Value References Examples

This function takes the fitted bias parameters from fitBiasModels and uses this information to derive bias corrected estimates of transcript abundance for a gene (with one or more isoforms) across multiple samples.

1
2
3

estimateAbundance(transcripts, bam.files, fitpar, genome, model.names,
  subset = TRUE, niter = 100, lib.sizes = NULL, optim = FALSE,
  custom.features = NULL)

`transcripts`	a GRangesList of the exons for multiple isoforms of a gene. For a single-isoform gene, just wrap the exons in `GRangesList()`
`bam.files`	a named vector pointing to the indexed BAM files
`fitpar`	the output of fitBiasModels
`genome`	a BSGenome object
`model.names`	a character vector of the bias models to use. These should have already been specified when calling fitBiasModels. Four exceptions are models that use none, one or both of the offsets, and these are called with: `"null"`, `"fraglen"`, `"vlmm"`, or `"fraglen.vlmm"`.
`subset`	logical, whether to downsample the non-observed fragments. Default is TRUE
`niter`	the number of EM iterations. Default is 100.
`lib.sizes`	a named vector of library sizes to use in calculating the FPKM. If NULL (the default) a value of 1e6 is used for all samples.
`optim`	logical, whether to use numerical optimization instead of the EM. Default is FALSE.
`custom.features`	an optional function to add custom features to the fragment types DataFrame. This function takes in a DataFrame returned by buildFragtypes and returns a DataFrame with additional columns added. Default is NULL, adding no custom features.

a list of lists. For each sample, a list with elements: theta, lambda and count.

theta gives the FPKM estimates for the isoforms in transcripts
lambda gives the average bias term for the isoforms
count gives the number of fragments which are compatible with any of the isoforms in transcripts

The model describing how bias estimates are used to estimate bias-corrected abundances is described in the Supplemental Note of the following publication:

Love, M.I., Hogenesch, J.B., and Irizarry, R.A., Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnologyh (2016) doi: 10.1038/nbt.3682

The likelihood formulation and EM algorithm for finding the maximum likelihood estimate for abundances follows this publication:

Salzman, J., Jiang, H., and Wong, W.H., Statistical Modeling of RNA-Seq Data. Statistical Science (2011) doi: 10.1214/10-STS343

# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)

model.names <- c("fraglen","GC")

txs <- txdf.theta$tx_id[txdf.theta$gene_id == "ENSG00000198918"]

res <- estimateAbundance(transcripts=ebt.theta[txs],
                         bam.files=bam.file,
                         fitpar=fitpar.small,
                         genome=Hsapiens,
                         model.names=model.names)