estimateTheta: Estimate bias-corrected transcript abundances (FPKM)
In mikelove/alpine: alpine

estimateTheta

R Documentation

Estimate bias-corrected transcript abundances (FPKM)

Description

This function takes the fitted bias parameters from fitBiasModels and uses this information to derive bias corrected estimates of transcript abundance for a gene (with one or more isoforms) across multiple samples.

Usage

estimateTheta(transcripts, bam.files, fitpar, genome, models, readlength,
  minsize, maxsize, subset = TRUE, niter = 100, lib.sizes = NULL,
  optim = FALSE, custom.features = NULL)

Arguments

`transcripts`	a GRangesList of the exons for multiple isoforms of a gene. For a single-isoform gene, just wrap the exons in `GRangesList()`
`bam.files`	a named vector pointing to the indexed BAM files
`fitpar`	the output of fitBiasModels
`genome`	a BSGenome object
`models`	a list of character strings or formula describing the bias models, see vignette
`readlength`	the read length
`minsize`	the minimum fragment length to model
`maxsize`	the maximum fragment length to model
`subset`	logical, whether to downsample the non-observed fragments. Default is TRUE
`niter`	the number of EM iterations. Default is 100.
`lib.sizes`	a named vector of library sizes to use in calculating the FPKM. If NULL (the default) a value of 1e6 is used for all samples.
`optim`	logical, whether to use numerical optimization instead of the EM. Default is FALSE.
`custom.features`	an optional function to add custom features to the fragment types DataFrame. This function takes in a DataFrame returned by buildFragtypes and returns a DataFrame with additional columns added. Default is NULL, adding no custom features.

Value

a list of lists. For each sample, a list with elements: theta, lambda and count.

theta gives the FPKM estimates for the isoforms in transcripts
lambda gives the average bias term for the isoforms
count gives the number of fragments which are compatible with any of the isoforms in transcripts

Examples


# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)
models <- list(
 "GC"=list(formula="count~
 ns(gc,knots=gc.knots,Boundary.knots=gc.bk) +
 ns(relpos,knots=relpos.knots,Boundary.knots=relpos.bk) +
 0",
 offset=c("fraglen"))
)

readlength <- 75
minsize <- 125 # see vignette how to choose
maxsize <- 175 # see vignette how to choose
txs <- txdf.theta$tx_id[txdf.theta$gene_id == "ENSG00000198918"]

res <- estimateTheta(transcripts=ebt.theta[txs],
                     bam.files=bam.file,
                     fitpar=fitpar.small,
                     genome=Hsapiens,
                     models=models,
                     readlength=readlength,
                     minsize=minsize,
                     maxsize=maxsize)

mikelove/alpine documentation built on June 9, 2024, 11:37 a.m.