estimateTheta: Estimate bias-corrected transcript abundances (FPKM)

View source: R/estimate_theta.R

estimateThetaR Documentation

Estimate bias-corrected transcript abundances (FPKM)

Description

This function takes the fitted bias parameters from fitBiasModels and uses this information to derive bias corrected estimates of transcript abundance for a gene (with one or more isoforms) across multiple samples.

Usage

estimateTheta(transcripts, bam.files, fitpar, genome, models, readlength,
  minsize, maxsize, subset = TRUE, niter = 100, lib.sizes = NULL,
  optim = FALSE, custom.features = NULL)

Arguments

transcripts

a GRangesList of the exons for multiple isoforms of a gene. For a single-isoform gene, just wrap the exons in GRangesList()

bam.files

a named vector pointing to the indexed BAM files

fitpar

the output of fitBiasModels

genome

a BSGenome object

models

a list of character strings or formula describing the bias models, see vignette

readlength

the read length

minsize

the minimum fragment length to model

maxsize

the maximum fragment length to model

subset

logical, whether to downsample the non-observed fragments. Default is TRUE

niter

the number of EM iterations. Default is 100.

lib.sizes

a named vector of library sizes to use in calculating the FPKM. If NULL (the default) a value of 1e6 is used for all samples.

optim

logical, whether to use numerical optimization instead of the EM. Default is FALSE.

custom.features

an optional function to add custom features to the fragment types DataFrame. This function takes in a DataFrame returned by buildFragtypes and returns a DataFrame with additional columns added. Default is NULL, adding no custom features.

Value

a list of lists. For each sample, a list with elements: theta, lambda and count.

  • theta gives the FPKM estimates for the isoforms in transcripts

  • lambda gives the average bias term for the isoforms

  • count gives the number of fragments which are compatible with any of the isoforms in transcripts

Examples


# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)
models <- list(
 "GC"=list(formula="count~
 ns(gc,knots=gc.knots,Boundary.knots=gc.bk) +
 ns(relpos,knots=relpos.knots,Boundary.knots=relpos.bk) +
 0",
 offset=c("fraglen"))
)

readlength <- 75
minsize <- 125 # see vignette how to choose
maxsize <- 175 # see vignette how to choose
txs <- txdf.theta$tx_id[txdf.theta$gene_id == "ENSG00000198918"]

res <- estimateTheta(transcripts=ebt.theta[txs],
                     bam.files=bam.file,
                     fitpar=fitpar.small,
                     genome=Hsapiens,
                     models=models,
                     readlength=readlength,
                     minsize=minsize,
                     maxsize=maxsize)


mikelove/alpine documentation built on June 9, 2024, 11:37 a.m.