estimateAbundance: Estimate bias-corrected transcript abundances (FPKM)

View source: R/estimate_abundance.R

estimateAbundanceR Documentation

Estimate bias-corrected transcript abundances (FPKM)

Description

This function takes the fitted bias parameters from fitBiasModels and uses this information to derive bias corrected estimates of transcript abundance for a gene (with one or more isoforms) across multiple samples.

Usage

estimateAbundance(transcripts, bam.files, fitpar, genome, model.names,
  subset = TRUE, niter = 100, lib.sizes = NULL, optim = FALSE,
  custom.features = NULL)

Arguments

transcripts

a GRangesList of the exons for multiple isoforms of a gene. For a single-isoform gene, just wrap the exons in GRangesList()

bam.files

a named vector pointing to the indexed BAM files

fitpar

the output of fitBiasModels

genome

a BSGenome object

model.names

a character vector of the bias models to use. These should have already been specified when calling fitBiasModels. Four exceptions are models that use none, one or both of the offsets, and these are called with: "null", "fraglen", "vlmm", or "fraglen.vlmm".

subset

logical, whether to downsample the non-observed fragments. Default is TRUE

niter

the number of EM iterations. Default is 100.

lib.sizes

a named vector of library sizes to use in calculating the FPKM. If NULL (the default) a value of 1e6 is used for all samples.

optim

logical, whether to use numerical optimization instead of the EM. Default is FALSE.

custom.features

an optional function to add custom features to the fragment types DataFrame. This function takes in a DataFrame returned by buildFragtypes and returns a DataFrame with additional columns added. Default is NULL, adding no custom features.

Value

a list of lists. For each sample, a list with elements: theta, lambda and count.

  • theta gives the FPKM estimates for the isoforms in transcripts

  • lambda gives the average bias term for the isoforms

  • count gives the number of fragments which are compatible with any of the isoforms in transcripts

References

The model describing how bias estimates are used to estimate bias-corrected abundances is described in the Supplemental Note of the following publication:

Love, M.I., Hogenesch, J.B., and Irizarry, R.A., Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnologyh (2016) doi: 10.1038/nbt.3682

The likelihood formulation and EM algorithm for finding the maximum likelihood estimate for abundances follows this publication:

Salzman, J., Jiang, H., and Wong, W.H., Statistical Modeling of RNA-Seq Data. Statistical Science (2011) doi: 10.1214/10-STS343

Examples


# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)

model.names <- c("fraglen","GC")

txs <- txdf.theta$tx_id[txdf.theta$gene_id == "ENSG00000198918"]

res <- estimateAbundance(transcripts=ebt.theta[txs],
                         bam.files=bam.file,
                         fitpar=fitpar.small,
                         genome=Hsapiens,
                         model.names=model.names)


mikelove/alpine documentation built on June 9, 2024, 11:37 a.m.